CN108230243B - Background blurring method based on salient region detection model - Google Patents

Background blurring method based on salient region detection model Download PDF

Info

Publication number
CN108230243B
CN108230243B CN201810133575.9A CN201810133575A CN108230243B CN 108230243 B CN108230243 B CN 108230243B CN 201810133575 A CN201810133575 A CN 201810133575A CN 108230243 B CN108230243 B CN 108230243B
Authority
CN
China
Prior art keywords
layer
size
deconvolution
convolutional
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201810133575.9A
Other languages
Chinese (zh)
Other versions
CN108230243A (en
Inventor
余春艳
徐小丹
陈立
杨素琼
王秀
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201810133575.9A priority Critical patent/CN108230243B/en
Publication of CN108230243A publication Critical patent/CN108230243A/en
Application granted granted Critical
Publication of CN108230243B publication Critical patent/CN108230243B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Toys (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a background blurring method based on a salient region detection model, which comprises the following steps of: acquiring an original image, constructing a convolution network of a saliency region detection model to obtain a saliency map of the original image, training the acquired saliency map in a fully-connected conditional random field to obtain an optimized saliency map, and performing binarization or segmentation processing on the optimized saliency map to obtain a 01 matrix to obtain a foreground index matrix and a background index matrix; realizing global blurring of an original image by using a distance weighted average algorithm; and finally, splicing the foreground original image and the fuzzy background image to generate a virtual background image. The invention can not only accurately detect the complete salient region, but also has a clear salient boundary, thereby retaining the characteristics of the foreground image when blurring the background and not damaging the image content of the foreground image.

Description

Background blurring method based on salient region detection model
Technical Field
The invention relates to the technical field of digital image processing, in particular to a background blurring method based on a salient region detection model.
Background
The blurring of the image background is a common processing process in tasks such as image rendering, beautifying and enhancing, and can effectively highlight a target object and fade background information, so that the visual effect is improved. At present, some image processing software can well complete the processing, but the processing methods of the image processing software need to manually label the foreground area, a large amount of manpower is consumed, and the image processing software is inconvenient for large-scale processing; in addition, the fuzzy diffusion modes in the prior art are all regular shapes, and are difficult to adapt to complicated and changeable image contents. The existing automatic background blurring technology is immature in foreground edge extraction, so that the boundary is not clear, an error area is cut, and the like.
Disclosure of Invention
The background blurring method based on the saliency region detection model can detect the whole saliency region, is good in performance under various complex conditions including a plurality of saliency objects, small-scale saliency objects and the like, can accurately detect the complete saliency region, and has clear saliency boundaries. Therefore, the characteristics of the foreground image can be kept when the background is blurred, and the image content of the foreground image is not damaged.
In order to achieve the purpose, the technical scheme of the invention is as follows: a background blurring method based on a salient region detection model comprises the following steps:
step S1: acquiring an original image;
step S2: constructing a saliency region detection model based on a convolutional neural network to obtain a saliency map of an original image;
step S3: putting the saliency map into a fully connected conditional random field for training to obtain an optimized saliency map;
step S4: carrying out binarization or segmentation processing on the optimized significance map to obtain a 01 matrix SBM, and obtaining a foreground index matrix IF and a background index matrix IB, wherein the definitions are as follows:
IF=SBM,IB=M×N-SBM
wherein, M multiplied by N is a full 1 matrix with the same resolution as the original image;
step S5: utilizing a distance weighted average algorithm to realize global blurring of an original image to obtain an original blurred image;
step S6: extracting a clear foreground image from the original image by using a foreground index matrix IF, and extracting a blurred background image from the original blurred image by using a background index matrix IB; and finally, splicing the clear foreground image and the fuzzy background image to obtain a background blurring result.
Further, the specific network structure of the salient region detection model is as follows:
the first layer is an input layer and inputs an original image;
The second layer is composed of two convolutional layers, wherein the first convolutional layer uses 64 convolutional kernels and has the size of (4, 4, 3), the second convolutional layer uses 64 convolutional kernels and has the size of (3, 3, 64), and the activation function is a ReLU function;
the third layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;
the fourth layer consists of two convolutional layers, where the first convolutional layer uses 128 convolutional kernels and has a size of (3, 3, 64), the second convolutional layer uses 128 convolutional kernels and has a size of (3, 3, 128), and the activation function is a ReLU function;
the fifth layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;
the sixth layer consists of three convolutional layers, wherein the first convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 128), the second convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 256), the third convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 256), and the activation function is a ReLU function;
the seventh layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;
the eighth layer consists of three convolutional layers, wherein the first convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 256), the second convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 512), the third convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 512), and the activation function is a ReLU function;
The ninth layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;
the tenth layer consists of three convolutional layers, where the first convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 512), the second convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 512), the third convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 512), and the activation function is a ReLU function;
the eleventh layer is a pooling layer, the size is (3, 3), the size of the extended edge is 1, and the activation function is a ReLU function;
the twelfth layer consists of two convolutional layers, where the first convolutional layer uses 1024 convolutional kernels and has a size of (3, 3, 512), the second convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 1024), and the activation function is the ReLU function;
the thirteenth layer consists of two convolutional layers and a normalization layer, wherein the first convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 1024), the second convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 256), and the activation function is a ReLU function;
the fourteenth layer is a deconvolution module, wherein two inputs are the thirteenth layer output and the twelfth layer output respectively;
the fifteenth layer is a deconvolution module, wherein two inputs are respectively the fourteenth layer output and the eighth layer output;
The sixteenth layer is a deconvolution module, wherein the two inputs are the fifteenth layer output and the sixth layer output respectively;
the seventeenth layer is a deconvolution module, wherein two inputs are respectively the sixteenth layer output and the fourth layer output;
the eighteenth layer is a deconvolution module, wherein two inputs are respectively the seventeenth layer output and the second layer output;
the nineteenth layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the output of the fourteenth layer, 1 convolution kernel with the size of (4, 4, 512) is used, the input of the convolution layer is the output of the fourteenth layer, 1 convolution kernel with the size of (1, 1, 512) is used, the cascade layer carries out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
the twentieth layer consists of two deconvolution layers and a shear layer, wherein the first deconvolution layer uses 1 convolution kernel and has the size of (8, 8, 2), the second deconvolution layer uses 1 convolution kernel and has the size of (8, 8, 1), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
the twenty-first layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the nineteenth layer output, 2 convolution kernels with the size of (4, 4, 2) are used, the input of the convolution layer is the fifteenth layer output, 1 convolution kernel with the size of (1, 1, 512) is used, the cascade layer carries out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
The twenty-second layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (16, 16, 1), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
the twenty-third layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the twenty-first layer output, 3 convolution kernels with the size of (4, 4, 3) are used, the input of the convolution layer is the sixteenth layer output, 1 convolution kernel with the size of (1, 1, 256) is used, the cascade layer carries out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
the twenty-fourth layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (8, 8, 2), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
a twenty-fifth layer is composed of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the twenty-third layer output, 4 convolution kernels with the size of (4, 4, 4) are used, the input of the convolution layer is the seventeenth layer output, 1 convolution kernel with the size of (1, 1, 128) is used, the cascade layer is used for carrying out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
The twenty-sixth layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (4, 4, 3), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
the twenty-seventh layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is twenty-fifth layer output, 3 convolution kernels with the size of (4, 4, 3) are used, the input of the convolution layer is seventeenth layer output, 1 convolution kernel with the size of (1, 1, 256) is used, the cascade layer is used for carrying out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
the twenty-eighth layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (2, 2, 4), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
the twenty-ninth layer is composed of a cascade layer and a convolution layer, the cascade layer performs channel connection on outputs of the twenty-eighth layer, the twenty-sixth layer, the twenty-fourth layer, the twenty-second layer and the twentieth layer, the convolution layer uses 1 convolution kernel, the size is (1, 1, 5), the activation function is a Sigmoid function, and a final output result is obtained;
The deconvolution module consists of a deconvolution layer, a shear layer, an Eltwise layer and a normalization layer, and the specific structure of the deconvolution module is as follows: let inputs be respectively characteristic diagrams C1And feature map C2The sizes are respectively (h)1,w1,k1) And (h)2,w2,k2) And the characteristic diagram C1Is smaller than the feature map C2The first layer is a deconvolution layer, using k2A convolution kernel of size (4, 4, k)1) The activation function is a ReLU function, and the input is a feature map C1(ii) a The second layer is a shear layer according to the characteristic diagram C2The size of the C is cut for the output of the previous layer, the third layer is an Eltwise layer, and the characteristic diagram C is obtained2Pixel by pixel with the output of the previous layerMultiplying, wherein the activation function is a ReLU function; the fourth layer is a normalization layer, and normalization operation is carried out on the output of the previous layer.
Further, the step S3 specifically includes:
the fully-connected conditional random field obtains an output after the saliency map is convolved by a fully-connected mode, the output result is input into the conditional random field, if x ═ (x1, x2, …, xn) represents an observed input data sequence, y ═ (y1, y2, …, yn) represents a state sequence, and under the condition that an input sequence is given, the joint conditional probability of the CRF model of the linear chain defining the state sequence is as follows:
Figure GDA0002948286510000041
Wherein: z is a probability normalization factor conditioned on the input data sequence x; f is an arbitrary characteristic function; w is the weight of each feature function,
Figure GDA0002948286510000042
is a strictly positive potential function.
Further, the step S5 specifically includes:
according to the difference of the importance of each pixel point, different weight numbers are respectively given to average, and the weighted average matrix of three pixel matrixes is respectively solved for the RGB three channels of the original image, so that the global blurring of the original image is realized.
Further, the step S6 specifically includes:
setting an original image and an original blurred image as IO and IB', respectively, extracting a clear foreground image ICF and a blurred background image IBB:
ICF(i,j)=IO(i,j)*IF(i,j)
IBB(i,j)=IB'(i,j)*IB(i,j)
where i is the x-axis coordinate, j is the y-axis coordinate,
and superposing the clear foreground image ICF and the fuzzy background image IBB to obtain a final image background blurring result.
Compared with the prior art, the invention has the beneficial effects that:
(1) the method can detect the whole significant region, and has good performance in various complex conditions including a plurality of significant objects, small-scale significant objects and the like;
(2) the invention can not only accurately detect the complete salient region, but also has a clearer salient boundary. Therefore, the characteristics of the foreground image can be kept when the background is blurred, and the image content of the foreground image is not damaged.
Drawings
FIG. 1 is a schematic flow chart of a background blurring method based on a salient region detection model according to the present invention;
FIG. 2 is a graph illustrating comparison of results according to an embodiment of the present invention.
Detailed Description
The invention is further explained below with reference to the drawings and the embodiments.
As shown in fig. 1, the present invention provides a background blurring method based on a saliency region detection model, comprising the following steps:
step S1: acquiring an original image;
step S2: constructing a saliency region detection model based on a convolutional neural network to obtain a saliency map of an original image;
step S3: putting the saliency map into a fully connected conditional random field for training to obtain an optimized saliency map;
step S4: carrying out binarization or segmentation processing on the optimized significance map to obtain a 01 matrix SBM, and obtaining a foreground index matrix IF and a background index matrix IB, wherein the definitions are as follows:
IF=SBM,IB=M×N-SBM
wherein, M multiplied by N is a full 1 matrix with the same resolution as the original image;
step S5: utilizing a distance weighted average algorithm to realize global blurring of an original image to obtain an original blurred image;
step S6: extracting a clear foreground image from the original image by using a foreground index matrix IF, and extracting a blurred background image from the original blurred image by using a background index matrix IB; and finally, splicing the clear foreground image and the fuzzy background image to obtain a background blurring result.
The specific network structure of the salient region detection model is as follows:
the first layer is an input layer and inputs an original image;
the second layer is composed of two convolutional layers, wherein the first convolutional layer uses 64 convolutional kernels and has the size of (4, 4, 3), the second convolutional layer uses 64 convolutional kernels and has the size of (3, 3, 64), and the activation function is a ReLU function;
the third layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;
the fourth layer consists of two convolutional layers, where the first convolutional layer uses 128 convolutional kernels and has a size of (3, 3, 64), the second convolutional layer uses 128 convolutional kernels and has a size of (3, 3, 128), and the activation function is a ReLU function;
the fifth layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;
the sixth layer consists of three convolutional layers, wherein the first convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 128), the second convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 256), the third convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 256), and the activation function is a ReLU function;
the seventh layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;
the eighth layer consists of three convolutional layers, wherein the first convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 256), the second convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 512), the third convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 512), and the activation function is a ReLU function;
The ninth layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;
the tenth layer consists of three convolutional layers, where the first convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 512), the second convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 512), the third convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 512), and the activation function is a ReLU function;
the eleventh layer is a pooling layer, the size is (3, 3), the size of the extended edge is 1, and the activation function is a ReLU function;
the twelfth layer consists of two convolutional layers, where the first convolutional layer uses 1024 convolutional kernels and has a size of (3, 3, 512), the second convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 1024), and the activation function is the ReLU function;
the thirteenth layer consists of two convolutional layers and a normalization layer, wherein the first convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 1024), the second convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 256), and the activation function is a ReLU function;
the fourteenth layer is a deconvolution module, wherein two inputs are the thirteenth layer output and the twelfth layer output respectively;
the fifteenth layer is a deconvolution module, wherein two inputs are respectively the fourteenth layer output and the eighth layer output;
The sixteenth layer is a deconvolution module, wherein the two inputs are the fifteenth layer output and the sixth layer output respectively;
the seventeenth layer is a deconvolution module, wherein two inputs are respectively the sixteenth layer output and the fourth layer output;
the eighteenth layer is a deconvolution module, wherein two inputs are respectively the seventeenth layer output and the second layer output;
the nineteenth layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the output of the fourteenth layer, 1 convolution kernel with the size of (4, 4, 512) is used, the input of the convolution layer is the output of the fourteenth layer, 1 convolution kernel with the size of (1, 1, 512) is used, the cascade layer carries out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
the twentieth layer consists of two deconvolution layers and a shear layer, wherein the first deconvolution layer uses 1 convolution kernel and has the size of (8, 8, 2), the second deconvolution layer uses 1 convolution kernel and has the size of (8, 8, 1), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
the twenty-first layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the nineteenth layer output, 2 convolution kernels with the size of (4, 4, 2) are used, the input of the convolution layer is the fifteenth layer output, 1 convolution kernel with the size of (1, 1, 512) is used, the cascade layer carries out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
The twenty-second layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (16, 16, 1), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
the twenty-third layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the twenty-first layer output, 3 convolution kernels with the size of (4, 4, 3) are used, the input of the convolution layer is the sixteenth layer output, 1 convolution kernel with the size of (1, 1, 256) is used, the cascade layer carries out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
the twenty-fourth layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (8, 8, 2), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
a twenty-fifth layer is composed of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the twenty-third layer output, 4 convolution kernels with the size of (4, 4, 4) are used, the input of the convolution layer is the seventeenth layer output, 1 convolution kernel with the size of (1, 1, 128) is used, the cascade layer is used for carrying out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
The twenty-sixth layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (4, 4, 3), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
the twenty-seventh layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is twenty-fifth layer output, 3 convolution kernels with the size of (4, 4, 3) are used, the input of the convolution layer is seventeenth layer output, 1 convolution kernel with the size of (1, 1, 256) is used, the cascade layer is used for carrying out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
the twenty-eighth layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (2, 2, 4), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
the twenty-ninth layer is composed of a cascade layer and a convolution layer, the cascade layer performs channel connection on outputs of the twenty-eighth layer, the twenty-sixth layer, the twenty-fourth layer, the twenty-second layer and the twentieth layer, the convolution layer uses 1 convolution kernel, the size is (1, 1, 5), the activation function is a Sigmoid function, and a final output result is obtained;
The deconvolution module consists of a deconvolution layer, a shear layer, an Eltwise layer and a normalization layer, and the specific structure of the deconvolution module is as follows: let inputs be respectively characteristic diagrams C1And feature map C2The sizes are respectively (h)1,w1,k1) And (h)2,w2,k2) And the characteristic diagram C1Is smaller than the feature map C2The first layer is a deconvolution layer, using k2A convolution kernel of size (4, 4, k)1) The activation function is a ReLU function, and the input is a feature map C1(ii) a The second layer is a shear layer according to the characteristic diagram C2The size of the C is cut for the output of the previous layer, the third layer is an Eltwise layer, and the characteristic diagram C is obtained2Multiplying the output of the previous layer by pixel, wherein the activation function is a ReLU function; the fourth layer is a normalization layer, and normalization operation is carried out on the output of the previous layer.
The step S3 specifically includes:
the fully-connected conditional random field obtains an output after the saliency map is convolved by a fully-connected mode, the output result is input into the conditional random field, if x ═ (x1, x2, …, xn) represents an observed input data sequence, y ═ (y1, y2, …, yn) represents a state sequence, and under the condition that an input sequence is given, the joint conditional probability of the CRF model of the linear chain defining the state sequence is as follows:
Figure GDA0002948286510000081
Wherein: z is a probability normalization factor conditioned on the input data sequence x; f is an arbitrary characteristic function; w is the weight of each feature function,
Figure GDA0002948286510000082
is a strictly positive potential function.
The step S5 specifically includes:
according to the difference of the importance of each pixel point, different weight numbers are respectively given to average, and the weighted average matrix of three pixel matrixes is respectively solved for the RGB three channels of the original image, so that the global blurring of the original image is realized.
The step S6 specifically includes:
setting an original image and an original blurred image as IO and IB', respectively, extracting a clear foreground image ICF and a blurred background image IBB:
ICF(i,j)=IO(i,j)*IF(i,j)
IBB(i,j)=IB'(i,j)*IB(i,j)
where i is the x-axis coordinate, j is the y-axis coordinate,
and superposing the clear foreground image ICF and the fuzzy background image IBB to obtain a final image background blurring result.
Fig. 2 is a comparison graph of background blurring results obtained by applying the method of the present invention, wherein the left side is an original image, and the right side is a background blurring result graph.
The above description is only of the preferred embodiments of the present invention, and the present invention is not limited to the above embodiments. It is to be understood that other modifications and variations directly derived or suggested to those skilled in the art without departing from the spirit and scope of the present invention are to be considered as included within the scope of the present invention.

Claims (4)

1. A background blurring method based on a salient region detection model is characterized by comprising the following steps:
step S1: acquiring an original image;
step S2: constructing a saliency region detection model based on a convolutional neural network to obtain a saliency map of an original image;
step S3: putting the saliency map into a fully connected conditional random field for training to obtain an optimized saliency map;
step S4: carrying out binarization or segmentation processing on the optimized significance map to obtain a 01 matrix SBM, and obtaining a foreground index matrix IF and a background index matrix IB, wherein the definitions are as follows:
IF=SBM,IB=M×N-SBM
wherein, M multiplied by N is a full 1 matrix with the same resolution as the original image;
step S5: utilizing a distance weighted average algorithm to realize global blurring of an original image to obtain an original blurred image;
step S6: extracting a clear foreground image from the original image by using a foreground index matrix IF, and extracting a blurred background image from the original blurred image by using a background index matrix IB; finally, the clear foreground image and the fuzzy background image are spliced to obtain a background blurring result;
the specific network structure of the salient region detection model is as follows:
the first layer is an input layer and inputs an original image;
the second layer is composed of two convolutional layers, wherein the first convolutional layer uses 64 convolutional kernels and has the size of (4, 4, 3), the second convolutional layer uses 64 convolutional kernels and has the size of (3, 3, 64), and the activation function is a ReLU function;
The third layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;
the fourth layer consists of two convolutional layers, where the first convolutional layer uses 128 convolutional kernels and has a size of (3, 3, 64), the second convolutional layer uses 128 convolutional kernels and has a size of (3, 3, 128), and the activation function is a ReLU function;
the fifth layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;
the sixth layer consists of three convolutional layers, wherein the first convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 128), the second convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 256), the third convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 256), and the activation function is a ReLU function;
the seventh layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;
the eighth layer consists of three convolutional layers, wherein the first convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 256), the second convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 512), the third convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 512), and the activation function is a ReLU function;
the ninth layer is a pooling layer with the size of (2, 2), and the activation function is a ReLU function;
The tenth layer consists of three convolutional layers, where the first convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 512), the second convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 512), the third convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 512), and the activation function is a ReLU function;
the eleventh layer is a pooling layer, the size is (3, 3), the size of the extended edge is 1, and the activation function is a ReLU function;
the twelfth layer consists of two convolutional layers, where the first convolutional layer uses 1024 convolutional kernels and has a size of (3, 3, 512), the second convolutional layer uses 512 convolutional kernels and has a size of (3, 3, 1024), and the activation function is the ReLU function;
the thirteenth layer consists of two convolutional layers and a normalization layer, wherein the first convolutional layer uses 256 convolutional kernels and has the size of (3, 3, 1024), the second convolutional layer uses 512 convolutional kernels and has the size of (3, 3, 256), and the activation function is a ReLU function;
the fourteenth layer is a deconvolution module, wherein two inputs are the thirteenth layer output and the twelfth layer output respectively;
the fifteenth layer is a deconvolution module, wherein two inputs are respectively the fourteenth layer output and the eighth layer output;
the sixteenth layer is a deconvolution module, wherein the two inputs are the fifteenth layer output and the sixth layer output respectively;
The seventeenth layer is a deconvolution module, wherein two inputs are respectively the sixteenth layer output and the fourth layer output;
the eighteenth layer is a deconvolution module, wherein two inputs are respectively the seventeenth layer output and the second layer output;
the nineteenth layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the output of the fourteenth layer, 1 convolution kernel with the size of (4, 4, 512) is used, the input of the convolution layer is the output of the fourteenth layer, 1 convolution kernel with the size of (1, 1, 512) is used, the cascade layer carries out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
the twentieth layer consists of two deconvolution layers and a shear layer, wherein the first deconvolution layer uses 1 convolution kernel and has the size of (8, 8, 2), the second deconvolution layer uses 1 convolution kernel and has the size of (8, 8, 1), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
the twenty-first layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the output of the nineteenth layer, 2 convolution kernels with the size of (4, 4, 2) are used, the input of the convolution layer is the output of the fifteenth layer, 1 convolution kernel with the size of (1, 1, 512) is used, the cascade layer carries out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
The twenty-second layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (16, 16, 1), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
the twenty-third layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the twenty-first layer output, 3 convolution kernels with the size of (4, 4, 3) are used, the input of the convolution layer is the sixteenth layer output, 1 convolution kernel with the size of (1, 1, 256) is used, the cascade layer carries out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
the twenty-fourth layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (8, 8, 2), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
a twenty-fifth layer is composed of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is the twenty-third layer output, 4 convolution kernels with the size of (4, 4, 4) are used, the input of the convolution layer is the seventeenth layer output, 1 convolution kernel with the size of (1, 1, 128) is used, the cascade layer is used for carrying out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
The twenty-sixth layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (4, 4, 3), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
the twenty-seventh layer consists of a deconvolution layer, a convolution layer and a cascade layer, wherein the input of the deconvolution layer is twenty-fifth layer output, 3 convolution kernels with the size of (4, 4, 3) are used, the input of the convolution layer is seventeenth layer output, 1 convolution kernel with the size of (1, 1, 256) is used, the cascade layer is used for carrying out channel connection on the deconvolution layer and the output of the convolution layer, and the activation function is a ReLU function;
the twenty-eighth layer consists of a deconvolution layer and a shear layer, wherein the deconvolution layer uses 1 convolution kernel and has the size of (2, 2, 4), the shear layer is used for shearing the deconvolution layer result into the same size as the original image, and the activation function is a Sigmoid function;
the twenty-ninth layer is composed of a cascade layer and a convolution layer, the cascade layer performs channel connection on outputs of the twenty-eighth layer, the twenty-sixth layer, the twenty-fourth layer, the twenty-second layer and the twentieth layer, the convolution layer uses 1 convolution kernel, the size is (1, 1, 5), the activation function is a Sigmoid function, and a final output result is obtained;
The deconvolution module consists of a deconvolution layer, a shear layer, an Eltwise layer and a normalization layer, and the specific structure of the deconvolution module is as follows: let inputs be respectively characteristic diagrams C1And feature map C2The sizes are respectively (h)1,w1,k1) And (h)2,w2,k2) And the characteristic diagram C1Is smaller than the feature map C2The first layer is a deconvolution layer, using k2A convolution kernel of size (4, 4, k)1) The activation function is a ReLU function, and the input is a feature map C1(ii) a The second layer is a shear layer according to the characteristic diagram C2The size of the C is cut for the output of the previous layer, the third layer is an Eltwise layer, and the characteristic diagram C is obtained2Multiplying the output of the previous layer by pixel, wherein the activation function is a ReLU function; the fourth layer is a normalization layer, and normalization operation is carried out on the output of the previous layer.
2. The salient region detection model-based background blurring method according to claim 1,
the step S3 specifically includes:
the fully-connected conditional random field obtains an output after the saliency map is convolved by a fully-connected mode, the output result is input into the conditional random field, if x ═ (x1, x2, …, xn) represents an observed input data sequence, y ═ (y1, y2, …, yn) represents a state sequence, and under the condition that an input sequence is given, the joint conditional probability of the CRF model of the linear chain defining the state sequence is as follows:
Figure FDA0002948286500000041
Wherein: z is a probability normalization factor conditioned on the input data sequence x; f is an arbitrary characteristic function; w is the weight of each feature function,
Figure FDA0002948286500000042
is a strictly positive potential function.
3. The background blurring method based on the salient region detection model according to claim 1, wherein the step S5 specifically includes:
according to the difference of the importance of each pixel point, different weight numbers are respectively given to average, and the weighted average matrix of three pixel matrixes is respectively solved for the RGB three channels of the original image, so that the global blurring of the original image is realized.
4. The background blurring method based on the salient region detection model according to claim 1, wherein the step S6 specifically includes:
setting an original image and an original blurred image as IO and IB', respectively, extracting a clear foreground image ICF and a blurred background image IBB:
ICF(i,j)=IO(i,j)*IF(i,j)
IBB(i,j)=IB'(i,j)*IB(i,j)
where i is the x-axis coordinate, j is the y-axis coordinate,
and superposing the clear foreground image ICF and the fuzzy background image IBB to obtain a final image background blurring result.
CN201810133575.9A 2018-02-09 2018-02-09 Background blurring method based on salient region detection model Expired - Fee Related CN108230243B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810133575.9A CN108230243B (en) 2018-02-09 2018-02-09 Background blurring method based on salient region detection model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810133575.9A CN108230243B (en) 2018-02-09 2018-02-09 Background blurring method based on salient region detection model

Publications (2)

Publication Number Publication Date
CN108230243A CN108230243A (en) 2018-06-29
CN108230243B true CN108230243B (en) 2021-04-27

Family

ID=62661349

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810133575.9A Expired - Fee Related CN108230243B (en) 2018-02-09 2018-02-09 Background blurring method based on salient region detection model

Country Status (1)

Country Link
CN (1) CN108230243B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109089040B (en) * 2018-08-20 2021-05-14 Oppo广东移动通信有限公司 Image processing method, image processing device and terminal equipment
CN109636764A (en) * 2018-11-01 2019-04-16 上海大学 A kind of image style transfer method based on deep learning and conspicuousness detection
CN109618173B (en) * 2018-12-17 2021-09-28 深圳Tcl新技术有限公司 Video compression method, device and computer readable storage medium
CN109727264A (en) * 2019-01-10 2019-05-07 南京旷云科技有限公司 Image generating method, the training method of neural network, device and electronic equipment
CN111680702B (en) * 2020-05-28 2022-04-01 杭州电子科技大学 Method for realizing weak supervision image significance detection by using detection frame
CN111861867B (en) * 2020-07-02 2024-02-13 泰康保险集团股份有限公司 Image background blurring method and device
CN116582743A (en) * 2023-07-10 2023-08-11 荣耀终端有限公司 Shooting method, electronic equipment and medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105513105A (en) * 2015-12-07 2016-04-20 天津大学 Image background blurring method based on saliency map
CN107169954A (en) * 2017-04-18 2017-09-15 华南理工大学 A kind of image significance detection method based on parallel-convolution neutral net
CN107247952A (en) * 2016-07-28 2017-10-13 哈尔滨工业大学 The vision significance detection method for the cyclic convolution neutral net supervised based on deep layer
CN107564025A (en) * 2017-08-09 2018-01-09 浙江大学 A kind of power equipment infrared image semantic segmentation method based on deep neural network
CN107610141A (en) * 2017-09-05 2018-01-19 华南理工大学 A kind of remote sensing images semantic segmentation method based on deep learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105513105A (en) * 2015-12-07 2016-04-20 天津大学 Image background blurring method based on saliency map
CN107247952A (en) * 2016-07-28 2017-10-13 哈尔滨工业大学 The vision significance detection method for the cyclic convolution neutral net supervised based on deep layer
CN107169954A (en) * 2017-04-18 2017-09-15 华南理工大学 A kind of image significance detection method based on parallel-convolution neutral net
CN107564025A (en) * 2017-08-09 2018-01-09 浙江大学 A kind of power equipment infrared image semantic segmentation method based on deep neural network
CN107610141A (en) * 2017-09-05 2018-01-19 华南理工大学 A kind of remote sensing images semantic segmentation method based on deep learning

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection;Zhaowei Cai 等;《arXiv:1607.07155v1》;20160725;全文 *
DHSNet: Deep Hierarchical Saliency Network for Salient Object Detection;Nian Liu 等;《2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)》;20161212;全文 *
Fully Convolutional Networks for Semantic Segmentation;Evan Shelhamer 等;《IEEE Transactions on Pattern Analysis and Machine Intelligence》;20170401;全文 *
基于视觉显著性的空频域多特征的目标检测方法研究;杜慧;《中国优秀硕士学位论文全文数据库 信息科技辑》;20170615(第06期);全文 *

Also Published As

Publication number Publication date
CN108230243A (en) 2018-06-29

Similar Documents

Publication Publication Date Title
CN108230243B (en) Background blurring method based on salient region detection model
CN108229479B (en) Training method and device of semantic segmentation model, electronic equipment and storage medium
CN108664981B (en) Salient image extraction method and device
CN109726627B (en) Neural network model training and universal ground wire detection method
CN111583097A (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN112651438A (en) Multi-class image classification method and device, terminal equipment and storage medium
CN107506792B (en) Semi-supervised salient object detection method
CN110570440A (en) Image automatic segmentation method and device based on deep learning edge detection
CN111652812A (en) Image defogging and rain removing algorithm based on selective attention mechanism
CN110610509A (en) Optimized matting method and system capable of assigning categories
CN112132164B (en) Target detection method, system, computer device and storage medium
US20230252605A1 (en) Method and system for a high-frequency attention network for efficient single image super-resolution
CN112700460B (en) Image segmentation method and system
CN114677479A (en) Natural landscape multi-view three-dimensional reconstruction method based on deep learning
CN110969641A (en) Image processing method and device
Sheng et al. An edge-guided method to fruit segmentation in complex environments
CN111967478B (en) Feature map reconstruction method, system, storage medium and terminal based on weight overturn
CN113421210A (en) Surface point cloud reconstruction method based on binocular stereo vision
US11200708B1 (en) Real-time color vector preview generation
CN116823700A (en) Image quality determining method and device
CN114820423A (en) Automatic cutout method based on saliency target detection and matching system thereof
CN111292342A (en) Method, device and equipment for cutting text in image and readable storage medium
CN110796716A (en) Image coloring method based on multiple residual error networks and regularized transfer learning
Wu et al. Exemplar-based image inpainting with collaborative filtering
CN115190226B (en) Parameter adjustment method, neural network model training method and related devices

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210427

Termination date: 20220209

CF01 Termination of patent right due to non-payment of annual fee