CN108230243B - Background blurring method based on salient region detection model - Google Patents
Background blurring method based on salient region detection model Download PDFInfo
- Publication number
- CN108230243B CN108230243B CN201810133575.9A CN201810133575A CN108230243B CN 108230243 B CN108230243 B CN 108230243B CN 201810133575 A CN201810133575 A CN 201810133575A CN 108230243 B CN108230243 B CN 108230243B
- Authority
- CN
- China
- Prior art keywords
- layer
- size
- deconvolution
- convolutional
- output
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 19
- 238000000034 method Methods 0.000 title claims abstract description 15
- 239000011159 matrix material Substances 0.000 claims abstract description 24
- 230000011218 segmentation Effects 0.000 claims abstract description 4
- 230000004913 activation Effects 0.000 claims description 75
- 238000010606 normalization Methods 0.000 claims description 15
- 238000011176 pooling Methods 0.000 claims description 15
- 238000010008 shearing Methods 0.000 claims description 15
- 238000010586 diagram Methods 0.000 claims description 12
- 230000001143 conditioned effect Effects 0.000 claims description 3
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000009792 diffusion process Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/04—Context-preserving transformations, e.g. by using an importance map
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/11—Region-based segmentation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Toys (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a background blurring method based on a salient region detection model, which comprises the following steps of: acquiring an original image, constructing a convolution network of a saliency region detection model to obtain a saliency map of the original image, training the acquired saliency map in a fully-connected conditional random field to obtain an optimized saliency map, and performing binarization or segmentation processing on the optimized saliency map to obtain a 01 matrix to obtain a foreground index matrix and a background index matrix; realizing global blurring of an original image by using a distance weighted average algorithm; and finally, splicing the foreground original image and the fuzzy background image to generate a virtual background image. The invention can not only accurately detect the complete salient region, but also has a clear salient boundary, thereby retaining the characteristics of the foreground image when blurring the background and not damaging the image content of the foreground image.
Description
Technical Field
The invention relates to the technical field of digital image processing, in particular to a background blurring method based on a salient region detection model.
Background
The blurring of the image background is a common processing process in tasks such as image rendering, beautifying and enhancing, and can effectively highlight a target object and fade background information, so that the visual effect is improved. At present, some image processing software can well complete the processing, but the processing methods of the image processing software need to manually label the foreground area, a large amount of manpower is consumed, and the image processing software is inconvenient for large-scale processing; in addition, the fuzzy diffusion modes in the prior art are all regular shapes, and are difficult to adapt to complicated and changeable image contents. The existing automatic background blurring technology is immature in foreground edge extraction, so that the boundary is not clear, an error area is cut, and the like.
Disclosure of Invention
The background blurring method based on the saliency region detection model can detect the whole saliency region, is good in performance under various complex conditions including a plurality of saliency objects, small-scale saliency objects and the like, can accurately detect the complete saliency region, and has clear saliency boundaries. Therefore, the characteristics of the foreground image can be kept when the background is blurred, and the image content of the foreground image is not damaged.
In order to achieve the purpose, the technical scheme of the invention is as follows: a background blurring method based on a salient region detection model comprises the following steps:
step S1: acquiring an original image;
step S2: constructing a saliency region detection model based on a convolutional neural network to obtain a saliency map of an original image;
step S3: putting the saliency map into a fully connected conditional random field for training to obtain an optimized saliency map;
step S4: carrying out binarization or segmentation processing on the optimized significance map to obtain a 01 matrix SBM, and obtaining a foreground index matrix IF and a background index matrix IB, wherein the definitions are as follows:
IF=SBM,IB=M×N-SBM
wherein, M multiplied by N is a full 1 matrix with the same resolution as the original image;
step S5: utilizing a distance weighted average algorithm to realize global blurring of an original image to obtain an original blurred image;
step S6: extracting a clear foreground image from the original image by using a foreground index matrix IF, and extracting a blurred background image from the original blurred image by using a background index matrix IB; and finally, splicing the clear foreground image and the fuzzy background image to obtain a background blurring result.
Further, the specific network structure of the salient region detection model is as follows:
the first layer is an input layer and inputs an original image;
The second layer is composed of two convolutional layers, wherein the first convolutional layer uses 64 convolutional kernels and has the size of (4, 4, 3), the second convolutional layer uses 64 convolutional kernels and has the size of (3, 3, 64), and the activation function is a ReLU function;
the third layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;
the fourth layer consists of two convolutional layers, where the first convolutional layer uses 128 convolutional kernels and has a size of (3, 3, 64), the second convolutional layer uses 128 convolutional kernels and has a size of (3, 3, 128), and the activation function is a ReLU function;
the fifth layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;
the sixth layer consists of three convolutional layers, wherein the first convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 128), the second convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 256), the third convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 256), and the activation function is a ReLU function;
the seventh layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;
the eighth layer consists of three convolutional layers, wherein the first convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 256), the second convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 512), the third convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 512), and the activation function is a ReLU function;
The ninth layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;
the tenth layer consists of three convolutional layers, where the first convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 512), the second convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 512), the third convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 512), and the activation function is a ReLU function;
the eleventh layer is a pooling layer, the size is (3, 3), the size of the extended edge is 1, and the activation function is a ReLU function;
the twelfth layer consists of two convolutional layers, where the first convolutional layer uses 1024 convolutional kernels and has a size of (3, 3, 512), the second convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 1024), and the activation function is the ReLU function;
the thirteenth layer consists of two convolutional layers and a normalization layer, wherein the first convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 1024), the second convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 256), and the activation function is a ReLU function;
the fourteenth layer is a deconvolution module, wherein two inputs are the thirteenth layer output and the twelfth layer output respectively;
the fifteenth layer is a deconvolution module, wherein two inputs are respectively the fourteenth layer output and the eighth layer output;
The sixteenth layer is a deconvolution module, wherein the two inputs are the fifteenth layer output and the sixth layer output respectively;
the seventeenth layer is a deconvolution module, wherein two inputs are respectively the sixteenth layer output and the fourth layer output;
the eighteenth layer is a deconvolution module, wherein two inputs are respectively the seventeenth layer output and the second layer output;
the nineteenth layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the output of the fourteenth layer, 1 convolution kernel with the size of (4, 4, 512) is used, the input of the convolution layer is the output of the fourteenth layer, 1 convolution kernel with the size of (1, 1, 512) is used, the cascade layer carries out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
the twentieth layer consists of two deconvolution layers and a shear layer, wherein the first deconvolution layer uses 1 convolution kernel and has the size of (8, 8, 2), the second deconvolution layer uses 1 convolution kernel and has the size of (8, 8, 1), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
the twenty-first layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the nineteenth layer output, 2 convolution kernels with the size of (4, 4, 2) are used, the input of the convolution layer is the fifteenth layer output, 1 convolution kernel with the size of (1, 1, 512) is used, the cascade layer carries out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
The twenty-second layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (16, 16, 1), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
the twenty-third layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the twenty-first layer output, 3 convolution kernels with the size of (4, 4, 3) are used, the input of the convolution layer is the sixteenth layer output, 1 convolution kernel with the size of (1, 1, 256) is used, the cascade layer carries out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
the twenty-fourth layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (8, 8, 2), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
a twenty-fifth layer is composed of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the twenty-third layer output, 4 convolution kernels with the size of (4, 4, 4) are used, the input of the convolution layer is the seventeenth layer output, 1 convolution kernel with the size of (1, 1, 128) is used, the cascade layer is used for carrying out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
The twenty-sixth layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (4, 4, 3), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
the twenty-seventh layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is twenty-fifth layer output, 3 convolution kernels with the size of (4, 4, 3) are used, the input of the convolution layer is seventeenth layer output, 1 convolution kernel with the size of (1, 1, 256) is used, the cascade layer is used for carrying out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
the twenty-eighth layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (2, 2, 4), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
the twenty-ninth layer is composed of a cascade layer and a convolution layer, the cascade layer performs channel connection on outputs of the twenty-eighth layer, the twenty-sixth layer, the twenty-fourth layer, the twenty-second layer and the twentieth layer, the convolution layer uses 1 convolution kernel, the size is (1, 1, 5), the activation function is a Sigmoid function, and a final output result is obtained;
The deconvolution module consists of a deconvolution layer, a shear layer, an Eltwise layer and a normalization layer, and the specific structure of the deconvolution module is as follows: let inputs be respectively characteristic diagrams C1And feature map C2The sizes are respectively (h)1,w1,k1) And (h)2,w2,k2) And the characteristic diagram C1Is smaller than the feature map C2The first layer is a deconvolution layer, using k2A convolution kernel of size (4, 4, k)1) The activation function is a ReLU function, and the input is a feature map C1(ii) a The second layer is a shear layer according to the characteristic diagram C2The size of the C is cut for the output of the previous layer, the third layer is an Eltwise layer, and the characteristic diagram C is obtained2Pixel by pixel with the output of the previous layerMultiplying, wherein the activation function is a ReLU function; the fourth layer is a normalization layer, and normalization operation is carried out on the output of the previous layer.
Further, the step S3 specifically includes:
the fully-connected conditional random field obtains an output after the saliency map is convolved by a fully-connected mode, the output result is input into the conditional random field, if x ═ (x1, x2, …, xn) represents an observed input data sequence, y ═ (y1, y2, …, yn) represents a state sequence, and under the condition that an input sequence is given, the joint conditional probability of the CRF model of the linear chain defining the state sequence is as follows:
Wherein: z is a probability normalization factor conditioned on the input data sequence x; f is an arbitrary characteristic function; w is the weight of each feature function,is a strictly positive potential function.
Further, the step S5 specifically includes:
according to the difference of the importance of each pixel point, different weight numbers are respectively given to average, and the weighted average matrix of three pixel matrixes is respectively solved for the RGB three channels of the original image, so that the global blurring of the original image is realized.
Further, the step S6 specifically includes:
setting an original image and an original blurred image as IO and IB', respectively, extracting a clear foreground image ICF and a blurred background image IBB:
ICF(i,j)=IO(i,j)*IF(i,j)
IBB(i,j)=IB'(i,j)*IB(i,j)
where i is the x-axis coordinate, j is the y-axis coordinate,
and superposing the clear foreground image ICF and the fuzzy background image IBB to obtain a final image background blurring result.
Compared with the prior art, the invention has the beneficial effects that:
(1) the method can detect the whole significant region, and has good performance in various complex conditions including a plurality of significant objects, small-scale significant objects and the like;
(2) the invention can not only accurately detect the complete salient region, but also has a clearer salient boundary. Therefore, the characteristics of the foreground image can be kept when the background is blurred, and the image content of the foreground image is not damaged.
Drawings
FIG. 1 is a schematic flow chart of a background blurring method based on a salient region detection model according to the present invention;
FIG. 2 is a graph illustrating comparison of results according to an embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
As shown in fig. 1, the present invention provides a background blurring method based on a saliency region detection model, comprising the following steps:
step S1: acquiring an original image;
step S2: constructing a saliency region detection model based on a convolutional neural network to obtain a saliency map of an original image;
step S3: putting the saliency map into a fully connected conditional random field for training to obtain an optimized saliency map;
step S4: carrying out binarization or segmentation processing on the optimized significance map to obtain a 01 matrix SBM, and obtaining a foreground index matrix IF and a background index matrix IB, wherein the definitions are as follows:
IF=SBM,IB=M×N-SBM
wherein, M multiplied by N is a full 1 matrix with the same resolution as the original image;
step S5: utilizing a distance weighted average algorithm to realize global blurring of an original image to obtain an original blurred image;
step S6: extracting a clear foreground image from the original image by using a foreground index matrix IF, and extracting a blurred background image from the original blurred image by using a background index matrix IB; and finally, splicing the clear foreground image and the fuzzy background image to obtain a background blurring result.
The specific network structure of the salient region detection model is as follows:
the first layer is an input layer and inputs an original image;
the second layer is composed of two convolutional layers, wherein the first convolutional layer uses 64 convolutional kernels and has the size of (4, 4, 3), the second convolutional layer uses 64 convolutional kernels and has the size of (3, 3, 64), and the activation function is a ReLU function;
the third layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;
the fourth layer consists of two convolutional layers, where the first convolutional layer uses 128 convolutional kernels and has a size of (3, 3, 64), the second convolutional layer uses 128 convolutional kernels and has a size of (3, 3, 128), and the activation function is a ReLU function;
the fifth layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;
the sixth layer consists of three convolutional layers, wherein the first convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 128), the second convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 256), the third convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 256), and the activation function is a ReLU function;
the seventh layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;
the eighth layer consists of three convolutional layers, wherein the first convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 256), the second convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 512), the third convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 512), and the activation function is a ReLU function;
The ninth layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;
the tenth layer consists of three convolutional layers, where the first convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 512), the second convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 512), the third convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 512), and the activation function is a ReLU function;
the eleventh layer is a pooling layer, the size is (3, 3), the size of the extended edge is 1, and the activation function is a ReLU function;
the twelfth layer consists of two convolutional layers, where the first convolutional layer uses 1024 convolutional kernels and has a size of (3, 3, 512), the second convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 1024), and the activation function is the ReLU function;
the thirteenth layer consists of two convolutional layers and a normalization layer, wherein the first convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 1024), the second convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 256), and the activation function is a ReLU function;
the fourteenth layer is a deconvolution module, wherein two inputs are the thirteenth layer output and the twelfth layer output respectively;
the fifteenth layer is a deconvolution module, wherein two inputs are respectively the fourteenth layer output and the eighth layer output;
The sixteenth layer is a deconvolution module, wherein the two inputs are the fifteenth layer output and the sixth layer output respectively;
the seventeenth layer is a deconvolution module, wherein two inputs are respectively the sixteenth layer output and the fourth layer output;
the eighteenth layer is a deconvolution module, wherein two inputs are respectively the seventeenth layer output and the second layer output;
the nineteenth layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the output of the fourteenth layer, 1 convolution kernel with the size of (4, 4, 512) is used, the input of the convolution layer is the output of the fourteenth layer, 1 convolution kernel with the size of (1, 1, 512) is used, the cascade layer carries out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
the twentieth layer consists of two deconvolution layers and a shear layer, wherein the first deconvolution layer uses 1 convolution kernel and has the size of (8, 8, 2), the second deconvolution layer uses 1 convolution kernel and has the size of (8, 8, 1), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
the twenty-first layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the nineteenth layer output, 2 convolution kernels with the size of (4, 4, 2) are used, the input of the convolution layer is the fifteenth layer output, 1 convolution kernel with the size of (1, 1, 512) is used, the cascade layer carries out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
The twenty-second layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (16, 16, 1), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
the twenty-third layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the twenty-first layer output, 3 convolution kernels with the size of (4, 4, 3) are used, the input of the convolution layer is the sixteenth layer output, 1 convolution kernel with the size of (1, 1, 256) is used, the cascade layer carries out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
the twenty-fourth layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (8, 8, 2), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
a twenty-fifth layer is composed of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the twenty-third layer output, 4 convolution kernels with the size of (4, 4, 4) are used, the input of the convolution layer is the seventeenth layer output, 1 convolution kernel with the size of (1, 1, 128) is used, the cascade layer is used for carrying out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
The twenty-sixth layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (4, 4, 3), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
the twenty-seventh layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is twenty-fifth layer output, 3 convolution kernels with the size of (4, 4, 3) are used, the input of the convolution layer is seventeenth layer output, 1 convolution kernel with the size of (1, 1, 256) is used, the cascade layer is used for carrying out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
the twenty-eighth layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (2, 2, 4), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
the twenty-ninth layer is composed of a cascade layer and a convolution layer, the cascade layer performs channel connection on outputs of the twenty-eighth layer, the twenty-sixth layer, the twenty-fourth layer, the twenty-second layer and the twentieth layer, the convolution layer uses 1 convolution kernel, the size is (1, 1, 5), the activation function is a Sigmoid function, and a final output result is obtained;
The deconvolution module consists of a deconvolution layer, a shear layer, an Eltwise layer and a normalization layer, and the specific structure of the deconvolution module is as follows: let inputs be respectively characteristic diagrams C1And feature map C2The sizes are respectively (h)1,w1,k1) And (h)2,w2,k2) And the characteristic diagram C1Is smaller than the feature map C2The first layer is a deconvolution layer, using k2A convolution kernel of size (4, 4, k)1) The activation function is a ReLU function, and the input is a feature map C1(ii) a The second layer is a shear layer according to the characteristic diagram C2The size of the C is cut for the output of the previous layer, the third layer is an Eltwise layer, and the characteristic diagram C is obtained2Multiplying the output of the previous layer by pixel, wherein the activation function is a ReLU function; the fourth layer is a normalization layer, and normalization operation is carried out on the output of the previous layer.
The step S3 specifically includes:
the fully-connected conditional random field obtains an output after the saliency map is convolved by a fully-connected mode, the output result is input into the conditional random field, if x ═ (x1, x2, …, xn) represents an observed input data sequence, y ═ (y1, y2, …, yn) represents a state sequence, and under the condition that an input sequence is given, the joint conditional probability of the CRF model of the linear chain defining the state sequence is as follows:
Wherein: z is a probability normalization factor conditioned on the input data sequence x; f is an arbitrary characteristic function; w is the weight of each feature function,is a strictly positive potential function.
The step S5 specifically includes:
according to the difference of the importance of each pixel point, different weight numbers are respectively given to average, and the weighted average matrix of three pixel matrixes is respectively solved for the RGB three channels of the original image, so that the global blurring of the original image is realized.
The step S6 specifically includes:
setting an original image and an original blurred image as IO and IB', respectively, extracting a clear foreground image ICF and a blurred background image IBB:
ICF(i,j)=IO(i,j)*IF(i,j)
IBB(i,j)=IB'(i,j)*IB(i,j)
where i is the x-axis coordinate, j is the y-axis coordinate,
and superposing the clear foreground image ICF and the fuzzy background image IBB to obtain a final image background blurring result.
Fig. 2 is a comparison graph of background blurring results obtained by applying the method of the present invention, wherein the left side is an original image, and the right side is a background blurring result graph.
The above description is only of the preferred embodiments of the present invention, and the present invention is not limited to the above embodiments. It is to be understood that other modifications and variations directly derived or suggested to those skilled in the art without departing from the spirit and scope of the present invention are to be considered as included within the scope of the present invention.
Claims (4)
1. A background blurring method based on a salient region detection model is characterized by comprising the following steps:
step S1: acquiring an original image;
step S2: constructing a saliency region detection model based on a convolutional neural network to obtain a saliency map of an original image;
step S3: putting the saliency map into a fully connected conditional random field for training to obtain an optimized saliency map;
step S4: carrying out binarization or segmentation processing on the optimized significance map to obtain a 01 matrix SBM, and obtaining a foreground index matrix IF and a background index matrix IB, wherein the definitions are as follows:
IF=SBM,IB=M×N-SBM
wherein, M multiplied by N is a full 1 matrix with the same resolution as the original image;
step S5: utilizing a distance weighted average algorithm to realize global blurring of an original image to obtain an original blurred image;
step S6: extracting a clear foreground image from the original image by using a foreground index matrix IF, and extracting a blurred background image from the original blurred image by using a background index matrix IB; finally, the clear foreground image and the fuzzy background image are spliced to obtain a background blurring result;
the specific network structure of the salient region detection model is as follows:
the first layer is an input layer and inputs an original image;
the second layer is composed of two convolutional layers, wherein the first convolutional layer uses 64 convolutional kernels and has the size of (4, 4, 3), the second convolutional layer uses 64 convolutional kernels and has the size of (3, 3, 64), and the activation function is a ReLU function;
The third layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;
the fourth layer consists of two convolutional layers, where the first convolutional layer uses 128 convolutional kernels and has a size of (3, 3, 64), the second convolutional layer uses 128 convolutional kernels and has a size of (3, 3, 128), and the activation function is a ReLU function;
the fifth layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;
the sixth layer consists of three convolutional layers, wherein the first convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 128), the second convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 256), the third convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 256), and the activation function is a ReLU function;
the seventh layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;
the eighth layer consists of three convolutional layers, wherein the first convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 256), the second convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 512), the third convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 512), and the activation function is a ReLU function;
the ninth layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;
The tenth layer consists of three convolutional layers, where the first convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 512), the second convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 512), the third convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 512), and the activation function is a ReLU function;
the eleventh layer is a pooling layer, the size is (3, 3), the size of the extended edge is 1, and the activation function is a ReLU function;
the twelfth layer consists of two convolutional layers, where the first convolutional layer uses 1024 convolutional kernels and has a size of (3, 3, 512), the second convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 1024), and the activation function is the ReLU function;
the thirteenth layer consists of two convolutional layers and a normalization layer, wherein the first convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 1024), the second convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 256), and the activation function is a ReLU function;
the fourteenth layer is a deconvolution module, wherein two inputs are the thirteenth layer output and the twelfth layer output respectively;
the fifteenth layer is a deconvolution module, wherein two inputs are respectively the fourteenth layer output and the eighth layer output;
the sixteenth layer is a deconvolution module, wherein the two inputs are the fifteenth layer output and the sixth layer output respectively;
The seventeenth layer is a deconvolution module, wherein two inputs are respectively the sixteenth layer output and the fourth layer output;
the eighteenth layer is a deconvolution module, wherein two inputs are respectively the seventeenth layer output and the second layer output;
the nineteenth layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the output of the fourteenth layer, 1 convolution kernel with the size of (4, 4, 512) is used, the input of the convolution layer is the output of the fourteenth layer, 1 convolution kernel with the size of (1, 1, 512) is used, the cascade layer carries out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
the twentieth layer consists of two deconvolution layers and a shear layer, wherein the first deconvolution layer uses 1 convolution kernel and has the size of (8, 8, 2), the second deconvolution layer uses 1 convolution kernel and has the size of (8, 8, 1), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
the twenty-first layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the output of the nineteenth layer, 2 convolution kernels with the size of (4, 4, 2) are used, the input of the convolution layer is the output of the fifteenth layer, 1 convolution kernel with the size of (1, 1, 512) is used, the cascade layer carries out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
The twenty-second layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (16, 16, 1), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
the twenty-third layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the twenty-first layer output, 3 convolution kernels with the size of (4, 4, 3) are used, the input of the convolution layer is the sixteenth layer output, 1 convolution kernel with the size of (1, 1, 256) is used, the cascade layer carries out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
the twenty-fourth layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (8, 8, 2), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
a twenty-fifth layer is composed of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the twenty-third layer output, 4 convolution kernels with the size of (4, 4, 4) are used, the input of the convolution layer is the seventeenth layer output, 1 convolution kernel with the size of (1, 1, 128) is used, the cascade layer is used for carrying out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
The twenty-sixth layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (4, 4, 3), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
the twenty-seventh layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is twenty-fifth layer output, 3 convolution kernels with the size of (4, 4, 3) are used, the input of the convolution layer is seventeenth layer output, 1 convolution kernel with the size of (1, 1, 256) is used, the cascade layer is used for carrying out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
the twenty-eighth layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (2, 2, 4), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
the twenty-ninth layer is composed of a cascade layer and a convolution layer, the cascade layer performs channel connection on outputs of the twenty-eighth layer, the twenty-sixth layer, the twenty-fourth layer, the twenty-second layer and the twentieth layer, the convolution layer uses 1 convolution kernel, the size is (1, 1, 5), the activation function is a Sigmoid function, and a final output result is obtained;
The deconvolution module consists of a deconvolution layer, a shear layer, an Eltwise layer and a normalization layer, and the specific structure of the deconvolution module is as follows: let inputs be respectively characteristic diagrams C1And feature map C2The sizes are respectively (h)1,w1,k1) And (h)2,w2,k2) And the characteristic diagram C1Is smaller than the feature map C2The first layer is a deconvolution layer, using k2A convolution kernel of size (4, 4, k)1) The activation function is a ReLU function, and the input is a feature map C1(ii) a The second layer is a shear layer according to the characteristic diagram C2The size of the C is cut for the output of the previous layer, the third layer is an Eltwise layer, and the characteristic diagram C is obtained2Multiplying the output of the previous layer by pixel, wherein the activation function is a ReLU function; the fourth layer is a normalization layer, and normalization operation is carried out on the output of the previous layer.
2. The salient region detection model-based background blurring method according to claim 1,
the step S3 specifically includes:
the fully-connected conditional random field obtains an output after the saliency map is convolved by a fully-connected mode, the output result is input into the conditional random field, if x ═ (x1, x2, …, xn) represents an observed input data sequence, y ═ (y1, y2, …, yn) represents a state sequence, and under the condition that an input sequence is given, the joint conditional probability of the CRF model of the linear chain defining the state sequence is as follows:
3. The background blurring method based on the salient region detection model according to claim 1, wherein the step S5 specifically includes:
according to the difference of the importance of each pixel point, different weight numbers are respectively given to average, and the weighted average matrix of three pixel matrixes is respectively solved for the RGB three channels of the original image, so that the global blurring of the original image is realized.
4. The background blurring method based on the salient region detection model according to claim 1, wherein the step S6 specifically includes:
setting an original image and an original blurred image as IO and IB', respectively, extracting a clear foreground image ICF and a blurred background image IBB:
ICF(i,j)=IO(i,j)*IF(i,j)
IBB(i,j)=IB'(i,j)*IB(i,j)
where i is the x-axis coordinate, j is the y-axis coordinate,
and superposing the clear foreground image ICF and the fuzzy background image IBB to obtain a final image background blurring result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810133575.9A CN108230243B (en) | 2018-02-09 | 2018-02-09 | Background blurring method based on salient region detection model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810133575.9A CN108230243B (en) | 2018-02-09 | 2018-02-09 | Background blurring method based on salient region detection model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108230243A CN108230243A (en) | 2018-06-29 |
CN108230243B true CN108230243B (en) | 2021-04-27 |
Family
ID=62661349
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810133575.9A Expired - Fee Related CN108230243B (en) | 2018-02-09 | 2018-02-09 | Background blurring method based on salient region detection model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108230243B (en) |
Families Citing this family (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109089040B (en) * | 2018-08-20 | 2021-05-14 | Oppo广东移动通信有限公司 | Image processing method, image processing device and terminal equipment |
CN109636764A (en) * | 2018-11-01 | 2019-04-16 | 上海大学 | A kind of image style transfer method based on deep learning and conspicuousness detection |
CN109618173B (en) * | 2018-12-17 | 2021-09-28 | 深圳Tcl新技术有限公司 | Video compression method, device and computer readable storage medium |
CN109727264A (en) * | 2019-01-10 | 2019-05-07 | 南京旷云科技有限公司 | Image generating method, the training method of neural network, device and electronic equipment |
CN111680702B (en) * | 2020-05-28 | 2022-04-01 | 杭州电子科技大学 | Method for realizing weak supervision image significance detection by using detection frame |
CN111861867B (en) * | 2020-07-02 | 2024-02-13 | 泰康保险集团股份有限公司 | Image background blurring method and device |
CN116582743A (en) * | 2023-07-10 | 2023-08-11 | 荣耀终端有限公司 | Shooting method, electronic equipment and medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105513105A (en) * | 2015-12-07 | 2016-04-20 | 天津大学 | Image background blurring method based on saliency map |
CN107169954A (en) * | 2017-04-18 | 2017-09-15 | 华南理工大学 | A kind of image significance detection method based on parallel-convolution neutral net |
CN107247952A (en) * | 2016-07-28 | 2017-10-13 | 哈尔滨工业大学 | The vision significance detection method for the cyclic convolution neutral net supervised based on deep layer |
CN107564025A (en) * | 2017-08-09 | 2018-01-09 | 浙江大学 | A kind of power equipment infrared image semantic segmentation method based on deep neural network |
CN107610141A (en) * | 2017-09-05 | 2018-01-19 | 华南理工大学 | A kind of remote sensing images semantic segmentation method based on deep learning |
-
2018
- 2018-02-09 CN CN201810133575.9A patent/CN108230243B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105513105A (en) * | 2015-12-07 | 2016-04-20 | 天津大学 | Image background blurring method based on saliency map |
CN107247952A (en) * | 2016-07-28 | 2017-10-13 | 哈尔滨工业大学 | The vision significance detection method for the cyclic convolution neutral net supervised based on deep layer |
CN107169954A (en) * | 2017-04-18 | 2017-09-15 | 华南理工大学 | A kind of image significance detection method based on parallel-convolution neutral net |
CN107564025A (en) * | 2017-08-09 | 2018-01-09 | 浙江大学 | A kind of power equipment infrared image semantic segmentation method based on deep neural network |
CN107610141A (en) * | 2017-09-05 | 2018-01-19 | 华南理工大学 | A kind of remote sensing images semantic segmentation method based on deep learning |
Non-Patent Citations (4)
Title |
---|
A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection;Zhaowei Cai 等;《arXiv:1607.07155v1》;20160725;全文 * |
DHSNet: Deep Hierarchical Saliency Network for Salient Object Detection;Nian Liu 等;《2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)》;20161212;全文 * |
Fully Convolutional Networks for Semantic Segmentation;Evan Shelhamer 等;《IEEE Transactions on Pattern Analysis and Machine Intelligence》;20170401;全文 * |
基于视觉显著性的空频域多特征的目标检测方法研究;杜慧;《中国优秀硕士学位论文全文数据库 信息科技辑》;20170615(第06期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN108230243A (en) | 2018-06-29 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108230243B (en) | Background blurring method based on salient region detection model | |
CN108229479B (en) | Training method and device of semantic segmentation model, electronic equipment and storage medium | |
CN108664981B (en) | Salient image extraction method and device | |
CN109726627B (en) | Neural network model training and universal ground wire detection method | |
CN111583097A (en) | Image processing method, image processing device, electronic equipment and computer readable storage medium | |
CN112651438A (en) | Multi-class image classification method and device, terminal equipment and storage medium | |
CN107506792B (en) | Semi-supervised salient object detection method | |
CN110570440A (en) | Image automatic segmentation method and device based on deep learning edge detection | |
CN111652812A (en) | Image defogging and rain removing algorithm based on selective attention mechanism | |
CN110610509A (en) | Optimized matting method and system capable of assigning categories | |
CN112132164B (en) | Target detection method, system, computer device and storage medium | |
US20230252605A1 (en) | Method and system for a high-frequency attention network for efficient single image super-resolution | |
CN112700460B (en) | Image segmentation method and system | |
CN114677479A (en) | Natural landscape multi-view three-dimensional reconstruction method based on deep learning | |
CN110969641A (en) | Image processing method and device | |
Sheng et al. | An edge-guided method to fruit segmentation in complex environments | |
CN111967478B (en) | Feature map reconstruction method, system, storage medium and terminal based on weight overturn | |
CN113421210A (en) | Surface point cloud reconstruction method based on binocular stereo vision | |
US11200708B1 (en) | Real-time color vector preview generation | |
CN116823700A (en) | Image quality determining method and device | |
CN114820423A (en) | Automatic cutout method based on saliency target detection and matching system thereof | |
CN111292342A (en) | Method, device and equipment for cutting text in image and readable storage medium | |
CN110796716A (en) | Image coloring method based on multiple residual error networks and regularized transfer learning | |
Wu et al. | Exemplar-based image inpainting with collaborative filtering | |
CN115190226B (en) | Parameter adjustment method, neural network model training method and related devices |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210427 Termination date: 20220209 |
|
CF01 | Termination of patent right due to non-payment of annual fee |