CN113379833B

CN113379833B - Image visible watermark positioning and segmenting method based on neural network

Info

Publication number: CN113379833B
Application number: CN202110709821.2A
Authority: CN
Inventors: 杨依忠; 李祥; 黄海霞; 张永强; 程心; 张章; 解光军
Original assignee: Hefei University of Technology
Current assignee: Hefei University of Technology
Priority date: 2021-06-25
Filing date: 2021-06-25
Publication date: 2022-08-05
Anticipated expiration: 2041-06-25
Also published as: CN113379833A

Abstract

The invention discloses a method for positioning and segmenting an image visible watermark based on a neural network, which comprises the following steps: 1. generating a watermark image data set; 2. constructing a deep learning network framework for positioning and segmenting the visible watermark; 3. training a network; 4. and (5) testing the network. The method accurately positions the visible watermarks on the segmented images by establishing the data set training convolutional neural network, thereby improving the accuracy and generalization capability of watermark segmentation positioning, and the extracted watermark information plays a key role in subsequent watermark removal, identification and image copyright protection.

Description

Image visible watermark positioning and segmenting method based on neural network

Technical Field

The invention relates to the technical field of visible digital image watermarks, in particular to a method for positioning and segmenting an image visible watermark based on a neural network.

Background

With the coming of information society and big data era, the number of pictures on the internet is also increasing explosively, and the copyright protection of images is concerned more and more. In order to identify, even remove and repair, the image containing the watermark. The positioning and segmentation of the watermark is very important.

In the traditional algorithm, watermark removal is realized by manually obtaining the position and shape information of the watermark. Therefore, a better effect can be obtained, meanwhile, the difficulty of manually designing the features is continuously increased, and higher requirements are put forward on experience knowledge of designers. In recent years, algorithms based on deep learning have performed well in the digital image field, and neural networks have also begun to be applied in the field of visible watermark removal. However, the existing watermark removing method is the conversion from an image to a picture, and the picture without the watermark is directly obtained from the picture added with the watermark. However, this is a huge amount of information because the label is a same color image, contains huge information, and has poor generalization ability.

Disclosure of Invention

In order to overcome the defects of the prior art, the invention provides the method for positioning and segmenting the visible watermark of the image based on the neural network, so that the neural network is used for automatically extracting the watermark characteristics and automatically positioning and segmenting the watermark, the accuracy and generalization capability of the positioning and segmenting of the watermark are improved, and the method provides favorable help for subsequent watermark removal, identification and image copyright protection.

In order to achieve the purpose, the technical scheme of the invention is as follows:

the invention relates to a method for positioning and segmenting an image visible watermark based on a convolutional neural network, which is characterized by comprising the following steps of:

step 1, generating input images and label images of a training set and a test set in a visible watermark image data set;

step 1.1, acquiring an original color image set I ═ { I ═ I _k |k＝1,2，…m}，I _k Representing the k-th original color image, and the k-th color image I _k Is a x b, a represents the length, b represents the width, and m represents the number of images in the set;

randomly selecting image watermark pattern from k character strings or other watermark templates, and randomly selecting watermark size smaller than original color image size, thereby generating RGB three-channel watermark image set W ═ W _k |k＝1,2,…m}，W _k Representing the k-th watermark image, wherein the k-th watermark image W _k Except the pixel points of the watermark part, the initial values of other pixel points are 0;

randomly determining the position of the watermark, adding the watermark image set W to the original color image set I by using the formula (1), and obtaining the image set I _ W with the watermark _k 1,2, … m as input images of the training set;

I_W _k ＝aI _k +(1-a)W _k (1)

in formula (1): i _ W _k Is the k-th image after adding the watermark; a represents the transparency of the added watermark and is randomly selected from 0-1;

step 1.2, converting the RGB three-channel watermark image set W into a gray image set G ═ G _k |k＝1,2,…m}，G _k Expressing the k-th gray level image, and binarizing the gray level image set G to obtain an image label set L ═ L _k |k＝1,2,…m}，L _k Representing the kth watermarked image I _ W _k The label image of (1);

step 1.3, generating an input image and a label image of a test set different from the training set according to the processes of the step 1.1 and the step 1.2;

step 2, constructing a deep learning network for watermark positioning and segmentation, comprising the following steps: the system comprises a multi-convolution kernel feature extraction network, a feature fusion module, a feature decoding network and an output network;

step 2.1, constructing the multi-convolution kernel feature extraction network, comprising: h parallel coding networks, the convolution kernels of each coding network are different in size, and the convolutional kernels are used for extracting features with different scales; each coding network comprises N coding modules, and each coding module comprises a convolutional layer, a BN layer and an activation function ReLU layer; the output channel number of the convolution layer of the N coding modules is set in an increasing way and is marked as N ₁ ,n ₂ ,…n _N ；

Step 2.2, constructing the feature fusion module, including: f convolutional layers, the number of channels of the F convolutional layers is set as F ₁ ,f ₂ ,…f _F Wherein f is ₁ ＝H×n _N ，f _F ＝n _N ；

And 2.3, constructing the feature decoding network, wherein the feature decoding network comprises N decoding modules, each decoding module comprises a convolutional layer, a BN layer and an activation function ReLU layer, and the number of output channels of the convolutional layer of the decoding modules is set in a descending manner and is recorded as N' ₁ ,n′ ₂ ,…n′ _N (ii) a And n' ₁ ＝n _N ，n′ _N ＝n ₁ ；

Step 2.4, constructing the output network including an input channel n ₁ And the output channel is the convolution layer of 1 and a sigmoid layer;

step 2.5, setting the current training period as T and the maximum training period as T _max And the number of images per batch is batch _ size;

2.6, in the t-th training period, sequentially and parallelly inputting the input images of the training set to a plurality of coding networks according to the batch size of batch _ size, and respectively outputting H coding feature images with the size of a multiplied by b through feature extraction of N coding modules of each coding network;

step 2.7, inputting the coding feature images into the feature fusion network, firstly splicing the coding feature images in a channel connection mode, and then obtaining the coded feature images with the size of a multiplied by b and the number of channels of H multiplied by n _N The feature images are subjected to feature fusion through F convolution layers, and finally the output size is a multiplied by b and the number of channels is n _N The fused feature image of (1);

step 2.8, inputting the fusion feature image into a feature decoding network, outputting a feature image with a size of a multiplied by b and a channel number of N after feature decoding of N feature decoding modules ₁ The characteristic image of (1);

step 2.9, the characteristic images pass through the output network, the number of the convolution layer output channels of the characteristic images is 1, the output result images with the size of a multiplied by b are processed by a Sigmoid layer, the output result images are mapped to be between 0 and 1, and therefore probability images of all input images in one batch in a training set are obtained, wherein the kth input image I _ W _k The probability graph of each pixel point being watermark part or non-watermark part is marked as O _k ；

Step 3, training the network:

step 3.1, calculating the probability chart O by adopting the formula (2) _k And a label image L _k Loss of gap (O) _k ,L _k )：

In the formula (2), O _k,i Represents a probability map O _k The value L of the ith pixel point after the one-dimensional process _k,i Indicating that the k-th label image L is to be printed _k The value of the ith pixel point after the one-dimensional processing, n represents the total number of the pixel points after the one-dimensional processing of the image, and i represents the coordinate of the pixel point after the one-dimensional processing of the image;

3.2, reversely propagating the gap loss to a deep learning network for watermark positioning and segmentation by using an optimization algorithm Adam to train so as to update network parameters until the gap loss is converged, thereby obtaining a watermark positioning and segmentation network with optimal parameters;

step 4, testing the image set with the watermark in the test set by using the watermark positioning and dividing network with the optimal parameters, thereby obtaining a corresponding watermark probability map;

and setting a threshold value, wherein the threshold value is used for carrying out binarization processing on the watermark probability map, and setting pixel points larger than or equal to the threshold value as watermark pixel points and setting pixel points smaller than the threshold value as non-watermark pixel points, so as to obtain a watermark label image.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention firstly proposes to manufacture the watermark positioning and dividing data set, obtains accurate and efficient watermark positioning and dividing capability through training the neural network by the training set, realizes the positioning and dividing of the watermark, and obtains accurate and effective information for further removing the watermark.

2. Compared with the traditional algorithm for manually designing the features, the method has the advantages that the feature extraction module and the segmentation module are realized by the neural network, the watermark positioning and segmentation are realized by automatically learning the image features of the data set, the manpower is replaced, and the watermark positioning and segmentation efficiency is improved;

3. the invention introduces the two classification ideas into the watermark positioning and removal for the first time, takes watermark pixels as one class, and takes other pixels as one class to perform the positioning and segmentation of the watermark, thereby greatly improving the accuracy of the watermark positioning and segmentation.

4. In the watermark positioning and dividing data set provided by the invention, the robustness of the watermark positioning and dividing capacity is greatly improved through the random selection of the shape, position, size, transparency and the like of the watermark.

5. The deep learning network framework for positioning and dividing the visible watermark, provided by the invention, adopts the modes of multi-scale feature acquisition, feature decoding fusion and the like, improves the positioning and dividing capabilities of watermarks with different sizes and different positions, and improves the accuracy of the deep learning network framework facing various watermark positioning and dividing.

Drawings

FIG. 1 is a partial image of a training set in a watermark location and segmentation dataset constructed in accordance with the present invention;

FIG. 2 is a partial image of a test set contained in a watermark positioning and segmenting data set established by the present invention;

FIG. 3 is a diagram of a neural network structure for watermark localization and segmentation in accordance with the present invention;

FIG. 4 is a flow chart of the method of the present invention;

FIG. 5 is an image of the test results of the test set of the present invention.

Detailed Description

In the embodiment, a method for positioning and segmenting an image visible watermark based on a convolutional neural network is that a watermark positioning segmentation data set is established, then the proposed neural network is trained by using the data set, and finally a segmented watermark is obtained so as to realize the positioning and segmentation of the watermark. As shown in fig. 4, the specific steps are as follows:

in specific implementation, a training set is generated firstly, 1000 RGB three-channel watermark images are generated by using an algorithm, and the rest initial values of the images are 0 except a watermark part. The pattern of the image watermark is randomly selected from three strings HFUT, ABCD and TRAIN. The watermark sizes are also randomly chosen from three of 30, 45 and 60, respectively. The location of the watermark appears randomly over the entire image. The size and number of images are unified according to specific needs, and the size a × b of the acquired original color image is set to 300 × 300, and the number m of images is 1000.

The position of the watermark is randomly determined so that the watermark image set W is added to the original using equation (1)To obtain a watermarked image set I _ W ═ I _ W _k 1,2, … m as input images of the training set;

I_W _k ＝aI _k +(1-a)W _k (1)

in formula (1): i _ W _k Is the k-th image after adding the watermark; a represents the transparency of the added watermark and is randomly selected from 0-1; in this embodiment, the watermark transparency a is randomly selected from 0.2, 0.5, and 0.8.

Step 1.2, converting the RGB three-channel watermark image set W into a gray image set G ═ G _k |k＝1,2,…m}，G _k Expressing the k-th gray level image, and binarizing the gray level image set G to obtain an image label set L ═ L _k |k＝1,2,…m}，L _k Representing the kth watermarked image I _ W _k The label image of (1); the threshold is selected to be 1, that is, the pixel point is greater than 0 and then is 1. An example of inputs and labels for a portion of the training set is shown in FIG. 1.

Step 1.3, generating an input image and a label image of a test set different from the training set according to the processes of the step 1.1 and the step 1.2; in specific implementation, for the TEST set, the step 1 is repeated to obtain, note that the selected color image is different from the training set, and the watermark pattern is changed into a character string TEST, the size a × b is also 300 × 300, and the number m is 100. An example of inputs and labels for a partial test set is shown in FIG. 2.

step 2.1, constructing the multi-convolution kernel feature extraction network, comprising: h parallel coding networks, the convolution kernels of each coding network are different in size, and the convolutional kernels are used for extracting features with different scales; each coding network comprises N coding modules, and each coding module comprises a convolutional layer, a BN layer and an activation function ReLU layer; the number of output channels of the convolution layer of the N coding modules is set in an increasing way and is recorded as N ₁ ,n ₂ ,…n _N ；

In this embodiment, 3 encoding modules, E1, E2, and E3, are provided, and the number of output convolution channels of the 3 encoding modules is 30, 60, and 90. The convolution kernel size of E1 is 3 × 3, and the step size and padding are set to 1. The convolution kernel size of E2 is 5 × 5, and the step size and padding are set to 1 and 2. The convolution kernel size of E3 is 7 × 7, with step size and padding set to 1 and 3.

Step 2.2, constructing the feature fusion module, including: f convolutional layers, the number of channels of the F convolutional layers is set as F ₁ ,f ₂ ,…f _F Wherein f is ₁ ＝3n _N ，f _F ＝n _N ；

In this example, set F to 2, first convolutional layer input channel 270 output to 90, second convolutional layer input channel 90 output channel also 90, convolutional kernel size 3 × 3, step size and pad set to 1

And 2.3, constructing the feature decoding network, wherein the feature decoding network comprises N decoding modules, each decoding module comprises a convolutional layer, a BN layer and an activation function ReLU layer, and the number of output channels of the convolutional layer of the decoding modules is set in a descending manner and is recorded as N' ₁ ,n′ ₂ ,…n′ _N (ii) a And n' ₁ ＝n _N ，n′ _N ＝n ₁ (ii) a In a specific implementation, N is set to 3, the number of output channels is 90, 60, 30, the convolution kernel size is 3 × 3, and the step size and padding are set to 1

Step 2.4, constructing the output network including an input channel n ₁ And outputting a convolution layer with a channel of 1 and a sigmoid layer; in which the convolution kernel size is 3 x 3 and the step size and padding are set to 1n ₁ Is 30; the structure of the network partitioned by watermark location in this example is shown in fig. 3.

Step 2.5, setting the current training period as T and the maximum training period as T _max And the number of each batch of processed images is batch _ size; the maximum training period is set to 200 and the batch _ size is set to 10.

2.6, in the t-th training period, sequentially and parallelly inputting the input images of the training set to a plurality of coding networks according to the batch size of batch _ size, and respectively outputting H coding feature images with the size of a multiplied by b through feature extraction of N coding modules of each coding network; in this embodiment, N is 3, a × b is 300 × 300, and batch _ size is set to 10.

Step 2.7, inputting the coding feature images into the feature fusion network, firstly splicing the coding feature images in a channel connection mode, and then obtaining the coded feature images with the size of a multiplied by b and the number of channels of H multiplied by n _N The feature images are subjected to feature fusion through F convolution layers, and finally the output size is a multiplied by b and the number of channels is n _N The fused feature image of (1); in a specific embodiment, N is 3, a × b is 300 × 300, F is 2, N _N Is 90.

Step 2.8, inputting the fusion feature image into a feature decoding network, outputting a feature image with a size of a multiplied by b and a channel number of N after feature decoding of N feature decoding modules ₁ The characteristic image of (1); n is 3, a × b is 300 × 300, N ₁ Is 30.

2.9, the characteristic image output by the characteristic decoding network passes through an output network, the output characteristic diagram of the decoding network is changed into an output result diagram with the channel number of 1 and the same size as the input image of the network, namely, a multiplied by b, the output result diagram passes through a Sigmoid layer, the range of the output result is mapped between 0 and 1, and k input images I _ W in the first training set in the batch _ size are obtained _k Probability graph O with each pixel point being watermark part or non-watermark part _k ；

Step 3, training the network:

In the formula (2), O _k,i Represents a probability map O _k The value of the ith pixel point after the one-dimensional operation, L _k,i Representing a label image L _k And (3) the value of the ith pixel point after the image is subjected to one-dimensional processing, n represents the total number of pixel points after the image is subjected to one-dimensional processing, and i represents the coordinate of the pixel points after the image is subjected to one-dimensional processing.

3.2, reversely propagating the gap loss to a deep learning network for watermark positioning and segmentation by using an optimization algorithm Adam to train so as to update network parameters until the gap loss is converged, thereby obtaining a watermark positioning and segmentation network with optimal parameters; wherein the Adam optimizer sets the learning rate to 0.0001 and the betas to 0.9 and 0.999;

In specific implementation, the threshold is 120, and pixels greater than or equal to 120 are judged as watermark pixels, and pixels less than or equal to 120 are judged as non-watermark pixels. The output quantitative results of the test set are shown in fig. 5, and the overall qualitative results of the test set are shown in table 1.

TABLE 1 SSIM and PSNR scores of the test set of the present invention

Epoch (period)	SSIM	PSNR
			25	0.96	18
100	0.97	32
			150	0.98	33

In summary, the invention provides a method for positioning and segmenting an image visible watermark based on a neural network, which introduces a binary classification idea into watermark positioning and segmentation, namely watermark pixels are one type of other pixels and the other type of other pixels, innovatively establishes a watermark segmentation data set and designs the neural network. A new idea and method are provided for watermark identification and removal.

Claims

1. A method for positioning and segmenting visible image watermarks based on a neural network is characterized by comprising the following steps:

randomly determining the position of the watermark, adding the watermark image set W to the original color image set I by using the formula (1), and obtaining the image set I _ W with the watermark _k 1,2, … m and as a disciplineAn input image of the exercise;

I_W _k ＝a′I _k +(1-a′)W _k (1)

in formula (1): i _ W _k Is the k-th image after adding the watermark; a' represents the transparency of the added watermark and is randomly selected from 0-1;

step 2.7, inputting the coding feature images into a feature fusion network, and firstly splicing the coding feature images together in a channel connection mode to obtain the coded feature images with the size of a multiplied by b and the number of channels of H multiplied by n _N The feature images are subjected to feature fusion through F convolution layers, and finally the output size is a multiplied by b and the number of channels is n _N The fused feature image of (1);

Step 3, training the network:

and setting a threshold value, wherein the threshold value is used for carrying out binarization processing on the watermark probability map, and setting pixel points which are greater than or equal to the threshold value as watermark pixel points and pixel points which are less than the threshold value as non-watermark pixel points, so as to obtain a watermark label image.