CN108230243B

CN108230243B - Background blurring method based on salient region detection model

Info

Publication number: CN108230243B
Application number: CN201810133575.9A
Authority: CN
Inventors: 余春艳; 徐小丹; 陈立; 杨素琼; 王秀
Original assignee: Fuzhou University
Current assignee: Fuzhou University
Priority date: 2018-02-09
Filing date: 2018-02-09
Publication date: 2021-04-27
Anticipated expiration: 2038-02-09
Also published as: CN108230243A

Abstract

The invention discloses a background blurring method based on a salient region detection model, which comprises the following steps of: acquiring an original image, constructing a convolution network of a saliency region detection model to obtain a saliency map of the original image, training the acquired saliency map in a fully-connected conditional random field to obtain an optimized saliency map, and performing binarization or segmentation processing on the optimized saliency map to obtain a 01 matrix to obtain a foreground index matrix and a background index matrix; realizing global blurring of an original image by using a distance weighted average algorithm; and finally, splicing the foreground original image and the fuzzy background image to generate a virtual background image. The invention can not only accurately detect the complete salient region, but also has a clear salient boundary, thereby retaining the characteristics of the foreground image when blurring the background and not damaging the image content of the foreground image.

Description

Background blurring method based on salient region detection model

Technical Field

The invention relates to the technical field of digital image processing, in particular to a background blurring method based on a salient region detection model.

Background

The blurring of the image background is a common processing process in tasks such as image rendering, beautifying and enhancing, and can effectively highlight a target object and fade background information, so that the visual effect is improved. At present, some image processing software can well complete the processing, but the processing methods of the image processing software need to manually label the foreground area, a large amount of manpower is consumed, and the image processing software is inconvenient for large-scale processing; in addition, the fuzzy diffusion modes in the prior art are all regular shapes, and are difficult to adapt to complicated and changeable image contents. The existing automatic background blurring technology is immature in foreground edge extraction, so that the boundary is not clear, an error area is cut, and the like.

Disclosure of Invention

The background blurring method based on the saliency region detection model can detect the whole saliency region, is good in performance under various complex conditions including a plurality of saliency objects, small-scale saliency objects and the like, can accurately detect the complete saliency region, and has clear saliency boundaries. Therefore, the characteristics of the foreground image can be kept when the background is blurred, and the image content of the foreground image is not damaged.

In order to achieve the purpose, the technical scheme of the invention is as follows: a background blurring method based on a salient region detection model comprises the following steps:

step S1: acquiring an original image;

step S2: constructing a saliency region detection model based on a convolutional neural network to obtain a saliency map of an original image;

step S3: putting the saliency map into a fully connected conditional random field for training to obtain an optimized saliency map;

step S4: carrying out binarization or segmentation processing on the optimized significance map to obtain a 01 matrix SBM, and obtaining a foreground index matrix IF and a background index matrix IB, wherein the definitions are as follows:

IF＝SBM，IB＝M×N-SBM

wherein, M multiplied by N is a full 1 matrix with the same resolution as the original image;

step S5: utilizing a distance weighted average algorithm to realize global blurring of an original image to obtain an original blurred image;

step S6: extracting a clear foreground image from the original image by using a foreground index matrix IF, and extracting a blurred background image from the original blurred image by using a background index matrix IB; and finally, splicing the clear foreground image and the fuzzy background image to obtain a background blurring result.

Further, the specific network structure of the salient region detection model is as follows:

the first layer is an input layer and inputs an original image;

The second layer is composed of two convolutional layers, wherein the first convolutional layer uses 64 convolutional kernels and has the size of (4, 4, 3), the second convolutional layer uses 64 convolutional kernels and has the size of (3, 3, 64), and the activation function is a ReLU function;

the third layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;

the fourth layer consists of two convolutional layers, where the first convolutional layer uses 128 convolutional kernels and has a size of (3, 3, 64), the second convolutional layer uses 128 convolutional kernels and has a size of (3, 3, 128), and the activation function is a ReLU function;

the fifth layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;

the sixth layer consists of three convolutional layers, wherein the first convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 128), the second convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 256), the third convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 256), and the activation function is a ReLU function;

the seventh layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;

the eighth layer consists of three convolutional layers, wherein the first convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 256), the second convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 512), the third convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 512), and the activation function is a ReLU function;

The ninth layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;

the tenth layer consists of three convolutional layers, where the first convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 512), the second convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 512), the third convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 512), and the activation function is a ReLU function;

the eleventh layer is a pooling layer, the size is (3, 3), the size of the extended edge is 1, and the activation function is a ReLU function;

the twelfth layer consists of two convolutional layers, where the first convolutional layer uses 1024 convolutional kernels and has a size of (3, 3, 512), the second convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 1024), and the activation function is the ReLU function;

the thirteenth layer consists of two convolutional layers and a normalization layer, wherein the first convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 1024), the second convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 256), and the activation function is a ReLU function;

the fourteenth layer is a deconvolution module, wherein two inputs are the thirteenth layer output and the twelfth layer output respectively;

the fifteenth layer is a deconvolution module, wherein two inputs are respectively the fourteenth layer output and the eighth layer output;

The sixteenth layer is a deconvolution module, wherein the two inputs are the fifteenth layer output and the sixth layer output respectively;

the seventeenth layer is a deconvolution module, wherein two inputs are respectively the sixteenth layer output and the fourth layer output;

the eighteenth layer is a deconvolution module, wherein two inputs are respectively the seventeenth layer output and the second layer output;

the nineteenth layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the output of the fourteenth layer, 1 convolution kernel with the size of (4, 4, 512) is used, the input of the convolution layer is the output of the fourteenth layer, 1 convolution kernel with the size of (1, 1, 512) is used, the cascade layer carries out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;

the twentieth layer consists of two deconvolution layers and a shear layer, wherein the first deconvolution layer uses 1 convolution kernel and has the size of (8, 8, 2), the second deconvolution layer uses 1 convolution kernel and has the size of (8, 8, 1), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;

the twenty-first layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the nineteenth layer output, 2 convolution kernels with the size of (4, 4, 2) are used, the input of the convolution layer is the fifteenth layer output, 1 convolution kernel with the size of (1, 1, 512) is used, the cascade layer carries out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;

The twenty-second layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (16, 16, 1), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;

the twenty-third layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the twenty-first layer output, 3 convolution kernels with the size of (4, 4, 3) are used, the input of the convolution layer is the sixteenth layer output, 1 convolution kernel with the size of (1, 1, 256) is used, the cascade layer carries out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;

the twenty-fourth layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (8, 8, 2), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;

a twenty-fifth layer is composed of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the twenty-third layer output, 4 convolution kernels with the size of (4, 4, 4) are used, the input of the convolution layer is the seventeenth layer output, 1 convolution kernel with the size of (1, 1, 128) is used, the cascade layer is used for carrying out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;

The twenty-sixth layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (4, 4, 3), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;

the twenty-seventh layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is twenty-fifth layer output, 3 convolution kernels with the size of (4, 4, 3) are used, the input of the convolution layer is seventeenth layer output, 1 convolution kernel with the size of (1, 1, 256) is used, the cascade layer is used for carrying out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;

the twenty-eighth layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (2, 2, 4), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;

the twenty-ninth layer is composed of a cascade layer and a convolution layer, the cascade layer performs channel connection on outputs of the twenty-eighth layer, the twenty-sixth layer, the twenty-fourth layer, the twenty-second layer and the twentieth layer, the convolution layer uses 1 convolution kernel, the size is (1, 1, 5), the activation function is a Sigmoid function, and a final output result is obtained;

The deconvolution module consists of a deconvolution layer, a shear layer, an Eltwise layer and a normalization layer, and the specific structure of the deconvolution module is as follows: let inputs be respectively characteristic diagrams C₁And feature map C₂The sizes are respectively (h)₁，w₁，k₁) And (h)₂，w₂，k₂) And the characteristic diagram C₁Is smaller than the feature map C₂The first layer is a deconvolution layer, using k₂A convolution kernel of size (4, 4, k)₁) The activation function is a ReLU function, and the input is a feature map C₁(ii) a The second layer is a shear layer according to the characteristic diagram C₂The size of the C is cut for the output of the previous layer, the third layer is an Eltwise layer, and the characteristic diagram C is obtained₂Pixel by pixel with the output of the previous layerMultiplying, wherein the activation function is a ReLU function; the fourth layer is a normalization layer, and normalization operation is carried out on the output of the previous layer.

Further, the step S3 specifically includes:

the fully-connected conditional random field obtains an output after the saliency map is convolved by a fully-connected mode, the output result is input into the conditional random field, if x ═ (x1, x2, …, xn) represents an observed input data sequence, y ═ (y1, y2, …, yn) represents a state sequence, and under the condition that an input sequence is given, the joint conditional probability of the CRF model of the linear chain defining the state sequence is as follows:

Wherein: z is a probability normalization factor conditioned on the input data sequence x; f is an arbitrary characteristic function; w is the weight of each feature function,

is a strictly positive potential function.

Further, the step S5 specifically includes:

according to the difference of the importance of each pixel point, different weight numbers are respectively given to average, and the weighted average matrix of three pixel matrixes is respectively solved for the RGB three channels of the original image, so that the global blurring of the original image is realized.

Further, the step S6 specifically includes:

setting an original image and an original blurred image as IO and IB', respectively, extracting a clear foreground image ICF and a blurred background image IBB:

ICF(i,j)＝IO(i,j)*IF(i,j)

IBB(i,j)＝IB'(i,j)*IB(i,j)

where i is the x-axis coordinate, j is the y-axis coordinate,

and superposing the clear foreground image ICF and the fuzzy background image IBB to obtain a final image background blurring result.

Compared with the prior art, the invention has the beneficial effects that:

(1) the method can detect the whole significant region, and has good performance in various complex conditions including a plurality of significant objects, small-scale significant objects and the like;

(2) the invention can not only accurately detect the complete salient region, but also has a clearer salient boundary. Therefore, the characteristics of the foreground image can be kept when the background is blurred, and the image content of the foreground image is not damaged.

Drawings

FIG. 1 is a schematic flow chart of a background blurring method based on a salient region detection model according to the present invention;

FIG. 2 is a graph illustrating comparison of results according to an embodiment of the present invention.

Detailed Description

The invention is further explained below with reference to the drawings and the embodiments.

As shown in fig. 1, the present invention provides a background blurring method based on a saliency region detection model, comprising the following steps:

step S1: acquiring an original image;

IF＝SBM，IB＝M×N-SBM

The specific network structure of the salient region detection model is as follows:

the first layer is an input layer and inputs an original image;

The deconvolution module consists of a deconvolution layer, a shear layer, an Eltwise layer and a normalization layer, and the specific structure of the deconvolution module is as follows: let inputs be respectively characteristic diagrams C₁And feature map C₂The sizes are respectively (h)₁，w₁，k₁) And (h)₂，w₂，k₂) And the characteristic diagram C₁Is smaller than the feature map C₂The first layer is a deconvolution layer, using k₂A convolution kernel of size (4, 4, k)₁) The activation function is a ReLU function, and the input is a feature map C₁(ii) a The second layer is a shear layer according to the characteristic diagram C₂The size of the C is cut for the output of the previous layer, the third layer is an Eltwise layer, and the characteristic diagram C is obtained₂Multiplying the output of the previous layer by pixel, wherein the activation function is a ReLU function; the fourth layer is a normalization layer, and normalization operation is carried out on the output of the previous layer.

The step S3 specifically includes:

is a strictly positive potential function.

The step S5 specifically includes:

The step S6 specifically includes:

ICF(i,j)＝IO(i,j)*IF(i,j)

IBB(i,j)＝IB'(i,j)*IB(i,j)

where i is the x-axis coordinate, j is the y-axis coordinate,

Fig. 2 is a comparison graph of background blurring results obtained by applying the method of the present invention, wherein the left side is an original image, and the right side is a background blurring result graph.

The above description is only of the preferred embodiments of the present invention, and the present invention is not limited to the above embodiments. It is to be understood that other modifications and variations directly derived or suggested to those skilled in the art without departing from the spirit and scope of the present invention are to be considered as included within the scope of the present invention.

Claims

1. A background blurring method based on a salient region detection model is characterized by comprising the following steps:

step S1: acquiring an original image;

IF＝SBM，IB＝M×N-SBM

step S6: extracting a clear foreground image from the original image by using a foreground index matrix IF, and extracting a blurred background image from the original blurred image by using a background index matrix IB; finally, the clear foreground image and the fuzzy background image are spliced to obtain a background blurring result;

the first layer is an input layer and inputs an original image;

the twenty-first layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the output of the nineteenth layer, 2 convolution kernels with the size of (4, 4, 2) are used, the input of the convolution layer is the output of the fifteenth layer, 1 convolution kernel with the size of (1, 1, 512) is used, the cascade layer carries out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;

2. The salient region detection model-based background blurring method according to claim 1,

the step S3 specifically includes:

is a strictly positive potential function.

3. The background blurring method based on the salient region detection model according to claim 1, wherein the step S5 specifically includes:

4. The background blurring method based on the salient region detection model according to claim 1, wherein the step S6 specifically includes:

ICF(i,j)＝IO(i,j)*IF(i,j)

IBB(i,j)＝IB'(i,j)*IB(i,j)

where i is the x-axis coordinate, j is the y-axis coordinate,