CN115131391A

CN115131391A - Image segmentation method and device, terminal equipment and computer readable storage medium

Info

Publication number: CN115131391A
Application number: CN202110325944.6A
Authority: CN
Inventors: 肖云雷; 刘阳兴
Original assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Current assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Priority date: 2021-03-26
Filing date: 2021-03-26
Publication date: 2022-09-30

Abstract

The application is applicable to the technical field of image processing, and provides an image segmentation method, an image segmentation device, terminal equipment and a computer-readable storage medium, wherein the method comprises the following steps: acquiring a first image, and carrying out size scaling on the first image to obtain a second image; inputting the second image into the first segmentation network model for processing, and outputting a first segmentation image; and inputting the first image, the second image and the first segmentation image into a second segmentation network model for processing, and outputting a second segmentation image with higher image segmentation precision than the first segmentation image. The embodiment of the application can finely divide the high-definition large image.

Description

Image segmentation method and device, terminal equipment and computer readable storage medium

Technical Field

The present application belongs to the field of image processing technologies, and in particular, to an image segmentation method, an image segmentation device, a terminal device, and a computer-readable storage medium.

Background

When deep learning is not popular, researchers often adopt a gradient algorithm to detect the boundary of a sky region in an image, and distinguish the sky region from a non-sky region by using an image dark channel algorithm; however, these algorithms have poor segmentation accuracy because object boundary lines and some fine objects and holes are difficult to segment. In recent years, more researchers will use neural networks to segment objects, but since models are generally trained by small images (such as changing the size of an image to 256x256 or 512x512, etc.), the segmentation effect is not good when high-definition large images are segmented.

Disclosure of Invention

Embodiments of the present application provide an image segmentation method, an image segmentation device, a terminal device, and a computer-readable storage medium, which can perform fine segmentation on a high-definition large image.

In a first aspect, an embodiment of the present application provides an image segmentation method, including:

acquiring a first image, and carrying out size scaling on the first image to obtain a second image;

inputting the second image into the first segmentation network model for processing, and outputting a first segmentation image;

and inputting the first image, the second image and the first segmentation image into a second segmentation network model for processing, and outputting a second segmentation image with higher image segmentation precision than the first segmentation image.

In a second aspect, an embodiment of the present application provides an image segmentation apparatus, including:

the image acquisition unit is used for acquiring a first image and carrying out size scaling on the first image to obtain a second image;

the first segmentation network model is used for inputting the second image into the first segmentation network model for processing and outputting a first segmentation image;

and the second segmentation network model is used for inputting the first image, the second image and the first segmentation image into the second segmentation network model for processing, and outputting a second segmentation image with higher image segmentation precision than the first segmentation image.

In a third aspect, an embodiment of the present application provides a terminal device, where the terminal device includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and the processor implements the method of any one of the first aspect when executing the computer program.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing a computer program, which when executed by a processor implements the method of any one of the first aspects.

In a fifth aspect, embodiments of the present application provide a computer program product, which, when run on a terminal device, causes the terminal device to perform the method of any one of the first aspect.

It is understood that the beneficial effects of the second aspect to the fifth aspect can be seen from the related description of the first aspect, and are not described herein again.

Compared with the prior art, the embodiment of the application has the beneficial effects that:

inputting a second image obtained by scaling the first image into the first segmentation network model to generate a first segmentation image; inputting the first image, the second image and the first segmentation image into a second segmentation network model for processing, and outputting a second segmentation image with higher image segmentation precision than the first segmentation image; by two times of segmentation with different precisions, the fine degree of the segmentation result can be improved, and therefore the high-definition large image can be finely segmented.

Some possible implementations of embodiments of the present application have the following beneficial effects:

the first segmentation network model is normalized by Frn, input data are a picture (a second image) and an original picture (a first image) after the size of the input data is changed, when the batch number is 1, the network can be rapidly converged, and then segmentation is carried out by combining the second segmentation network model which is normalized by Frn normalization, so that the segmentation result is more precise, and the fine segmentation degree of the high-definition large picture can be further improved;

acquiring image parameters of a background template, toning the first image according to the image parameters, and generating a tone-changed toned image; generating a third image according to the second segmentation image, the color-mixing image and the background template; therefore, the overall color of the picture is more natural after the picture is fused;

the loss function is mean square error, the value range of the obtained segmentation graph is between 0 and 1, and the segmentation graph can be directly used for seamlessly fusing the background template.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart of an image segmentation method according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a first segmentation network model according to an embodiment of the present application;

FIG. 3 is a diagram illustrating a structure under a convolution block according to an embodiment of the present application;

FIG. 4 is a schematic structural diagram of a first feature network according to an embodiment of the present application;

FIG. 5 is a block diagram illustrating an upper operation block according to an embodiment of the present disclosure;

FIG. 6 is a schematic structural diagram of an end feature network provided in an embodiment of the present application;

FIG. 7 is a schematic structural diagram of a structure on a volume block according to an embodiment of the present application;

fig. 8 is a schematic structural diagram of a second segmentation network model provided in an embodiment of the present application;

FIG. 9 is a flowchart illustrating a background changing method according to an embodiment of the present application;

fig. 10 is a schematic structural diagram of an image processing apparatus according to an embodiment of the present application;

fig. 11 is another schematic structural diagram of a first segmentation network model according to an embodiment of the present application;

fig. 12 is another schematic structural diagram of a terminal device according to an embodiment of the present application;

FIG. 13 is a diagram of an example of a first image provided by an embodiment of the present application;

FIG. 14 is a diagram illustrating an example of a second image provided by an embodiment of the present application;

FIG. 15 is a diagram of an example of a first segmented image provided by an embodiment of the present application;

FIG. 16 is a diagram of an example of a second segmented image provided by an embodiment of the present application;

FIG. 17 is a diagram illustrating an example sky template provided by an embodiment of the present application;

FIG. 18 is a diagram of an example of a third image provided by an embodiment of the present application;

FIG. 19(a) is another exemplary diagram of a first image provided by an embodiment of the present application;

fig. 19(b) is an exemplary diagram of a second divided image corresponding to fig. 19 (a);

FIG. 20(a) is another exemplary diagram of a third image provided by an embodiment of the present application;

fig. 20(b) is an exemplary diagram of a first divided image corresponding to fig. 20 (a);

FIG. 21(a) is a diagram of a third example of a first image provided by an embodiment of the present application;

fig. 21(b) is an exemplary diagram of a second divided image corresponding to fig. 21 (a);

FIG. 22(a) is a diagram of a third example of a third image provided by an embodiment of the present application;

fig. 22(b) is an exemplary view of the first divided image corresponding to fig. 22 (a).

Detailed Description

In order to make the technical problems, technical solutions and advantages to be solved by the present application more clearly apparent, the present application is further described in detail below with reference to fig. 1 to 22 and the embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.

It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.

As used in this specification and the appended claims, the term "if" may be interpreted contextually as "when", "upon" or "in response to" determining "or" in response to detecting ". Similarly, the phrase "if it is determined" or "if a [ described condition or event ] is detected" may be interpreted contextually to mean "upon determining" or "in response to determining" or "upon detecting [ described condition or event ]" or "in response to detecting [ described condition or event ]".

Furthermore, in the description of the present application and the appended claims, the terms "first," "second," "third," and the like are used for distinguishing between descriptions and not necessarily for describing a relative importance or importance.

Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather mean "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.

The present embodiment provides an image processing method, and in particular, an image processing method based on learnable guided filtering, where one application form is an image segmentation method, and another application form is a background changing method. The method of this embodiment may be applied to a mobile phone, a tablet computer, a wearable device, an in-vehicle device, an Augmented Reality (AR)/Virtual Reality (VR) device, a notebook computer, an ultra-mobile personal computer (UMPC), a netbook, a Personal Digital Assistant (PDA), and other terminal devices, and the specific type of the terminal device is not limited in this embodiment.

In this embodiment, the background is the sky. In other embodiments, the background is a water surface or a wall, preferably a background with a single content, so long as targeted training is performed.

Fig. 1 shows a schematic flowchart of an image processing method provided by the present embodiment, which can be applied to the above-described terminal device by way of example and not limitation. The image processing method of the present embodiment includes steps S1 to S3, which are image segmentation methods, specifically, image segmentation methods based on learnable guided filtering.

Step S1, acquiring a first image, and performing size scaling on the first image to obtain a second image.

The first image is an image with a background, and may be referred to as a first background-carrying image. The second image is also an image with a background, which may be referred to as a second background image. The second image is obtained by changing the size of the first image, in particular by scaling the first image.

In the present embodiment, referring to fig. 13, the first image is a picture with sky I _h As the original drawing; the second image is composed of a sky picture I _h (first image) resizing, e.g. to take a picture of sky I _h Is scaled to 512x512, resulting in a second image I _l (or called as picture I) _l ) (ii) a Of course, the second image I is based on the actual situation _l Other sizes are also possible.

To obtainSecond image I _l The concrete mode of (1) comprises: the second image may be pre-processed and read directly from the memory means of the terminal device when required for use; alternatively, the second image is the first image I input in real time _h And (4) processing to obtain the product.

In step S2, the second image is input to the first segmented network model and processed, and the first segmented image is output.

The first segmented network model is a coarsely segmented network model. The first segmentation network model is used for performing coarse segmentation on the second image.

Second image I _l As input data for a first segmented network model, i.e. a coarsely segmented network model, which is applied to the second image I _l Outputting a first divided image O after processing _l (also called rough segmentation chart O) _l ). Wherein the first divided image O is acquired _l In the process of (3), the first segmentation network model performs a normalization operation.

In some embodiments, the first segmented network model is a segmented network model that employs an FRN normalization layer and a mean square error as a loss function; in other words, the first segmentation network model uses frn (filter Response normalization) normalization and a Mean-Square Error (MSE) loss function to calculate the loss; wherein the first segmentation network model performs FRN normalization after performing various convolution operations; the above step S2 (inputting the second image into the first segmentation network model for processing, outputting the first segmented image) includes steps S21 to S23.

And step S21, inputting the second image into the first segmentation network model to perform convolution operation, FRN normalization operation, activation function operation and separation convolution operation to obtain image characteristics.

In some embodiments, the second image is input into a convolution under-block structure of the first segmented network model. The structure under the volume block can execute convolution operation, FRN normalization operation, activation function operation and separation convolution operation.

In the present embodiment, referring to fig. 2, the number of convolution block lower structures is five, which are respectively a convolution block lower structure ConvBlockDown1 to a convolution block lower structure ConvBlockDown 5.

The structure of the structure under each volume block is the same, as is the processing of the input data. Please refer to fig. 3 for a specific structure of the convolution block of the present embodiment: conv1x1 represents a convolution operation with a convolution kernel size of 1; FRN is normalization operation; ReLU6 is an activation function, limiting the maximum output value to 6; DepthWisseConv 3x3/2 is a separate convolution operation with a convolution kernel size of 3 and a step size of 2, and when the step size is 2, the width and height of the feature map after the convolution operation are respectively half of the original size.

It should be noted that the number of structures under the convolution block, the convolution kernel size of the convolution operation, the convolution kernel size of the separate convolution operation, and the step size may all be determined by training according to the actual application scenario.

Referring to fig. 3, the processing procedure of the input data by the convolution under-block structure of the first segmentation network model is specifically as follows:

for input data such as the second image I _l Performing convolution operation with convolution kernel size of 1 × 1, FRN normalization operation and ReLU6 activation function operation in sequence, and outputting a first normalization result;

sequentially performing separation convolution operation, FRN normalization operation and ReLU6 activation function operation on the first normalization result, and outputting a second normalization result;

and performing convolution operation with the convolution kernel size of 1 multiplied by 1 and FRN normalization operation on the second normalization result to obtain image characteristics.

Referring to fig. 2, for the five-volume-block lower structure, the second image I _l Inputting the image data into a convolution block lower structure ConvBlockDown1 to generate a first image feature X1; the first image feature X1 is input to the convolution block down structure ConvBlockDown2 and to the feature network; the volume block under structure ConvBlockDown2 generates a second image feature X2 based on the first image feature X1; the second image feature X2 is input to the convolution under-block structure ConvBlockDown3 and to the feature network; the volume block under structure ConvBlockDown3 generates a third image feature X3 based on the second image feature X2; the third image feature X3 is input under the convolution blockThe structure ConvBlockDown4, and input to the feature network; the rolling block lower structure ConvBlockDown4 generates a fourth image feature X4 based on the third image feature X3; the fourth image feature X4 is input to the convolution block down structure ConvBlockDown5 and to the feature network; the volume block lower structure ConvBlockDown5 generates a fifth image feature X5 based on the fourth image feature X4; the fifth image feature X5 is input to the feature network. The same is true for the number of structures under the volume block, two, three, four, six, seven, and more than eight.

Thus, the second image I _l The image features X1-X5 are output via the rolling block under structure. The image features X1 to X5 are feature maps.

And step S22, performing up-sampling, convolution operation, FRN normalization operation, feature map addition and activation function operation on the image features to obtain feature network results.

And inputting the image features X1-X5 obtained in the step S21 into a feature network to generate a feature network result. The feature network can perform upsampling, convolution operations, FRN normalization operations, feature map addition, and activation function operations.

In some embodiments, referring to fig. 2, the feature network comprises a first feature network and an end feature network; the number of the first feature networks is two, and the number of the end feature networks is one. In other embodiments, the number of the first feature networks is one, three, four, five, or six or more, and is determined by training according to the actual application scenario.

The image features X1 to X5 output feature network results via the first feature network BiFPN1, the first feature network BiFPN2, and the end feature network BiFPNLast.

The configuration of each first feature network is the same, and the processing of input data is the same. The specific structure of the first feature network of some embodiments is shown in fig. 4: add is a feature map addition, ReLU6 is an activation function, the structures of the lower operation blocks BlockDown 1-BlockDown 4 are the same as those of the convolution block lower structure of FIG. 3, and the specific structures of the upper operation blocks BlockUp 1-BlockUp 4 are shown in FIG. 5.

In fig. 5, the upsampling is twice the linear upsampling (Bilinear upsampling), and after the feature map is subjected to the operation, the width and the height of the feature map are respectively twice the original size; conv3x3 represents a convolution operation with a convolution kernel size of 3; FRN is a normalization operation. The specific type of upsampling may be determined according to actual conditions.

Referring to fig. 5, the processing of the input data by each upper operation block BlockUp is also the same, as follows: performing linear up-sampling two-time operation on input data, outputting an up-sampling result, performing convolution operation with a convolution kernel size of 3 on the up-sampling result, performing FRN normalization operation, and outputting a normalization operation result.

The feature map operation block Add + ReLU6 is a block that performs a feature map addition operation and a ReLU6 activation function operation on input data.

Based on the foregoing, the first feature network can perform upsampling, convolution, FRN normalization, feature map addition, ReLU6 activation function operation, and split convolution operation. The first feature network carries out up-sampling, convolution operation, FRN normalization operation, feature map addition, ReLU6 activation function operation and separation convolution operation on image features X1-X5 to obtain a preliminary feature network result.

A process of processing input data by the first-feature network will be described. Referring to fig. 4, the image feature X5 is input to the top operation block BlockUp4 and the first feature map operation block Add + ReLU6 on the right (counted from top to bottom in fig. 4); the output result of the upper operation block BlockUp4 is input to the first feature map operation block Add + ReLU6 on the left (counted from top to bottom in fig. 4); the image feature X4 is input to a left first feature map operation block Add + ReLU6 and a right second feature map operation block Add + ReLU 6; the output result of the first feature map operation block Add + ReLU6 on the left is input to the upper operation block BlockUp3 and the second feature map operation block Add + ReLU6 on the right; the output result of the upper operation block BlockUp3 is input to the second feature map operation block Add + ReLU6 on the left side; the image feature X3 is input to the left second feature map operation block Add + ReLU6 and the right third feature map operation block Add + ReLU 6; the output result of the second feature map operation block Add + ReLU6 on the left is input to the upper operation block BlockUp2 and the third feature map operation block Add + ReLU6 on the right; the output result of the upper operation block BlockUp2 is input to the third feature map operation block Add + ReLU6 on the left side; the image feature X2 is input to the left third feature map operation block Add + ReLU6 and the right fourth feature map operation block Add + ReLU 6; the output result of the third feature map operation block Add + ReLU6 on the left is input to the upper operation block BlockUp1 and the fourth feature map operation block Add + ReLU6 on the right; the output result of the upper operation block BlockUp1 is input to the fourth feature map operation block Add + ReLU6 on the left side; the image feature X1 is input to the fourth feature map operation block Add + ReLU6 on the left; the output result H1 of the fourth feature map operation block Add + ReLU6 on the left side is input into the lower operation block Block Down1 and is output outwards; the output result of the lower operation block BlockDown1 is input to the fourth feature map operation block Add + ReLU6 on the right side; the output result H2 of the fourth feature map operation block Add + ReLU6 on the right is input into the lower operation block Block BlockDown2 and is output outwards; the output result of the lower operation block BlockDown2 is input to the third feature map operation block Add + ReLU6 on the right side; the output result H3 of the third feature map operation block Add + ReLU6 on the right is input into the lower operation block Block BlockDown3 and is output outwards; the output result of the next operation block BlockDown3 is input to the second feature map operation block Add + ReLU6 on the right side; the output result H4 of the second feature map operation block Add + ReLU6 on the right is input into the lower operation block Block BlockDown4 and is output outwards; the output result of the lower operation block BlockDown4 is input to the first feature map operation block Add + ReLU6 on the right side; the output result H5 of the right first feature map operation block Add + ReLU6 is output to the outside.

Referring to fig. 2, image features X1 through X5 are input to a first feature network BiFPN1, generating first output results H1 through H5. The output results H1 to H5 are input to the first feature network BiFPN2, generating preliminary feature network results F1 to F5.

The preliminary feature network results F1 to F5 are input to the end feature network bifpan nlast, and feature network results are generated.

Please refer to fig. 6 for a specific structure of the end feature network bifpan nlast: in this embodiment, the end feature network BiFPNLast also includes upper operation blocks BlockUp1 through BlockUp4, and includes four feature map operation blocks Add + ReLU 6. It can be seen that the end feature network includes upsampling, convolution operation, FRN normalization operation, feature map addition, and ReLU6 activation function operation. Thus, the preliminary feature network results F1-F5 are input to an end feature network that can perform upsampling, convolution operations, FRN normalization operations, feature map addition, and ReLU6 activation function operations.

The process of the end feature network processing the input data is explained. Referring to fig. 6, the preliminary feature network result F5 is input to the upper operation block BlockUp 4; the output result of the upper operation block BlockUp4 is input to the first feature map operation block Add + ReLU6 (counted from top to bottom in fig. 6); inputting the preliminary feature network result F4 into a first feature map operation block Add + ReLU 6; the output result of the first feature map operation block Add + ReLU6 is input to the upper operation block BlockUp 3; the output result of the upper operation block BlockUp3 is input to a second feature map operation block Add + ReLU 6; inputting the preliminary feature network result F3 into a second feature map operation block Add + ReLU 6; the output result of the second feature map operation block Add + ReLU6 is input to the upper operation block BlockUp 2; the output result of the upper operation block BlockUp2 is input to a third feature map operation block Add + ReLU 6; the preliminary feature network result F2 is input to the third feature map operation block Add + ReLU 6; the output result of the third feature map operation block Add + ReLU6 is input to the upper operation block BlockUp 1; the output result of the upper operation block BlockUp1 is input to the fourth feature map operation block Add + ReLU 6; the preliminary feature network result F1 is input to the fourth feature map operation block Add + ReLU 6; the fourth feature map operation block Add + ReLU6 generates and outputs a feature network result.

And step S23, performing up-sampling, convolution operation, FRN normalization operation and loss function operation on the feature network result, and outputting a first segmentation image.

And inputting the characteristic network result to a rolling block upper structure of the first segmentation network model, and outputting a first segmentation image by the rolling block upper structure. The convolution block upper structure can perform upsampling, convolution operations, FRN normalization operations, and mean square error loss functions.

Fig. 7 shows a specific structure of the volume block upper structure convblockackup: the volume block upper structure ConvBlockUp contains a linear upsampling twice, a convolution operation with a convolution kernel size of 3, an FRN normalization operation, a convolution operation with a convolution kernel size of 3, and a sigmoid activation function.

Referring to fig. 7, the feature network result from the feature network is input to the on-volume-block structure ConvBlockUp, generating a first segmented image O _l (also called rough segmentation chart O) _l ) (ii) a First divided image O _l See fig. 15 for an example. Specifically, referring to fig. 7, an input feature network result is up-sampled, specifically twice as much as a linear up-sampling result, and an up-sampling result is output; performing convolution operation with convolution kernel size of 3 and FRN normalization operation on the up-sampling result to obtain a convolution normalization result; performing convolution operation with a convolution kernel size of 3 on the convolution normalization result to obtain a convolution result; carrying out classification operation on the convolution result, wherein sigmoid is adopted as an activation function, carrying out sigmoid function operation on the convolution result, and outputting a first segmentation image O _l 。

In step S3, the first image, the second image, and the first divided image are input to the second divided network model and processed, and the second divided image having higher image division accuracy than the first divided image is output.

The second segmentation network model is a segmentation network model adopting an FRN normalization layer and guided filtering. In some embodiments, the second segmentation network model is a guided filtering fine segmentation model. The second segmentation network model is used for fine segmentation of the image. Referring to FIG. 8, a picture I with sky _h (first image), second image I _l And a first divided image O _l Sending into a second segmentation network model to obtain a sky picture I _h Second segmentation image O of _h . Second segmented image O _h The segmentation precision of the fine segmentation result is higher than that of the first segmentation image. Second segmented image O _h Please refer to fig. 16.

Referring to fig. 8, in some embodiments, the aforementioned second segmentation network model is a learnable guided filtering fine segmentation model. The learnable guided filtering fine segmentation model can perform a specified Convolution operation (representing a series of Convolution operations), a hole Convolution operation (scaled Conv), a point-by-point Convolution operation (point Convolution Block), upsampling, and a Linear operation (Linear Layer); the hole convolution operation is a hole convolution operation of 3 × 3, and the point-by-point convolution operation is a convolution operation of 1x 1. The learnable guided filtering fine segmentation model employs FRN normalization, in particular, FRN normalization is performed after performing various convolution operations, such as the aforementioned hole convolution operation and point-by-point convolution operation. The above step S3 (inputting the first image, the second image, and the first divided image into the second divided network model for processing, outputting the second divided image whose image division accuracy is higher than that of the first divided image) includes steps S31 to S39.

Step S31, the first image I _h Inputting the image into a second segmentation network model to perform convolution operation F (I) to obtain the size and the first image I _h The first characteristic diagram G with the same size _h 。

F (I) is a series of convolution operations for extracting feature maps. F (I) is calculated by extracting the feature map G _h Size and I of _h The same can be achieved.

Step S32, for the second image I _l Performing convolution operation F (I) to obtain the size and the second image I _l Second characteristic diagram G with the same size _l 。

F (I) is a series of convolution operations for extracting feature maps. The operation of F (I) is carried out by extracting the feature map G _l Size of and second image I _l The same as above.

Step S33, for the first divided image O _l Performing a hole convolution operation to obtain a first hole convolution result

Step S34, for the second feature map G _l Performing a hole convolution operation to obtain a second hole convolution result

For the first divided image O _l And a second characteristic diagram G _l The hole convolution operation is performed toSo that the number of channels is the same.

Step S35, convolution result of the first hole

Performing point-by-point convolution operation to obtain a first guide filtering key parameter A _l 。

Step S36, convolution result of the second hole

Performing point-by-point convolution operation to obtain a second guide filtering key parameter b _l 。

First guide filtering key parameter A _l And a second guided filtering key parameter b _l All are guiding filtering key parameters of small graph dimensions. Wherein the small picture is a second image I _l 。

Step S37, the first guiding filtering key parameter A is processed _l Performing upsampling, specifically twice as much as linear upsampling, to obtain a third guided filtering key parameter A _h 。

Step S38, the second guiding filtering key parameter b _l Performing upsampling, specifically twice as much as linear upsampling, to obtain a fourth guide filtering key parameter b _h 。

Third guided filtering key parameter A _h And a fourth guide filtering key parameter b _h The key parameters of the guide filtering in the dimension of the large graph. Wherein the large image is the first image I _h 。

Step S39, according to the first characteristic diagram G _h The third guiding filtering key parameter A _h And a fourth guided filtering key parameter b _h Determining that the image segmentation accuracy is higher than the first segmented image O _l Second segmentation image O of _h 。

Second segmented image O _h As a result of the fine segmentation O _h 。

Step S39 is to obtain the second segmentation image O by linear operation _h (ii) a The formula of the linear operation is as follows (1).

O _h ＝A _h *G _h +b _h (1)

Thus, a sky picture I can be obtained _h Fine segmentation result of (1) _h 。

Referring to fig. 9, the image processing method based on learnable guided filtering of the present embodiment further includes steps S4 to S6. In this way, the method of this embodiment is a method for replacing background by segmenting an image based on a learnable guided filter, and specifically is a method for replacing background by finely segmenting an image based on a learnable guided filter.

Step S4, the background template and the image parameters of the background template are acquired.

In some embodiments, the background template is a sky template, SkyTemplate. An example of the sky template SkyTemplate refers to fig. 17.

The image parameters are image collocation parameters of the background template. Acquiring the image parameters of the background template may be implemented, for example, as follows: the brightness and color balance parameters collocated with the sky template SkyTemplate (background template) are obtained by image processing software, such as Photoshop. Wherein the image parameters include brightness and color balance parameters.

And step S5, performing image toning on the first image according to the image parameters to obtain the designated image.

The input picture (namely the first image) can be subjected to color mixing according to the image parameters to obtain a picture imgColorChange after the color tone is changed; the picture imgColorChange is a toned image. The color-mixing image is a designated image and is an image to be fused.

In step S6, a third image is determined based on the second segmented image, the background template, and the designated image.

In some embodiments, step S6 is: and acquiring an image fusion formula, and carrying out image fusion on the second segmentation image, the background template and the designated image by using the image fusion formula to obtain a third image. Illustratively, the second divided image O is divided according to the following formula (2) _h (O _h Image), designated image imgColorChange (i.e. the aforementioned toned image) and sky template Skytemplate (i.e. background template) are fused to form new image with sky _ij (i.e., the third image).

NewImg _ij ＝imgColorChange*(255-O _h )/255+SkyTemplate*O _h /255 (2)

It should be noted that if the sky template SkyTemplate is a series of dynamic pictures, a dynamic sky-changing effect can be synthesized.

According to the method, firstly, a background template and image parameters of the background template are obtained, and image toning is carried out on a first image according to the image parameters to obtain a designated image; then, based on the second division image O _h Segmenting the designated image and segmenting the background template; and fusing the two segmentation results together to form a third image, so that the background in the third image is the same as the background in the background template. This has the advantage that the overall color of the picture after blending can be made to look more natural. An example of the third image is shown in fig. 18.

For Image fusion, poisson fusion and Image matching are common; the Poisson fusion can directly change the color of the sky template, and does not meet the requirement; the Image matching process is complex, and requires a ternary diagram first, and then an alpha diagram is obtained by using a matching algorithm (optionally shared matching and globalmiting, etc.) for fusion. In contrast, in the above embodiment, the loss function is a mean square error, and the obtained segmentation map value range is between 0 and 1, which can be directly used to seamlessly blend the sky template (i.e. the background template).

The image processing method based on the learnable guided filtering of the embodiment further includes a training process of the model.

The training process comprises the following steps:

computing labels and second segmentation image O using a mean square error loss function _h (i.e., fine segmentation result O) _h ) Generating a loss result; the label is a label corresponding to a training sample used in the training process;

updating the overall network parameters according to the loss result, including updating parameters of the first and/or second split network models, such as updating parameters of at least one of the under volume block structure, the feature network, the over volume block structure, and the second split network model.

Common segmentation Networks include FCN (full Convolutional neural Networks), UNet, SegNet, and the like; in the training process of these partition networks, the input data of the same batch is resize to a uniform size, the number of batches is greater than 1, bn (batch normalization) normalization is used in the network structure to accelerate network convergence, and the loss function is generally selected as cross entropy. Some networks incorporate some fine segmentation models, such as deep lab in combination with CRF (Conditional Random Field) algorithm, in order to make the segmentation result better; however, since CRF cannot be jointly trained with the segmentation network, the scholars propose that crfa shrnn model and the segmentation model are jointly trained to obtain better results, wherein rnn (recovery Neural network) is a recurrent Neural network. For the current image segmentation technologies, the segmentation effect is not good directly due to the fact that the distribution of a high-definition large image is different from that of a training set small image. If the size of the high-definition big image is changed into the size of a (resize) training set image, and then the model is segmented, the effect is also poor, and the main reason is as follows: firstly, the resize process may lose some details of the high-definition big image; secondly, after the picture resize returns to the big picture, a blur phenomenon occurs.

In the above embodiment, the second image obtained by changing the size of the first image (original image) is input to the first segmentation network model (coarse segmentation network) that is normalized by FRN and calculates the loss by the mean square error loss function, and the first segmentation image (coarse segmentation result) is generated; inputting the first image, the second image and the first segmentation image into a second segmentation network model (a learning-oriented filtering fine segmentation model) which adopts FRN normalization, and generating a second segmentation image (a fine segmentation result) with higher image segmentation precision than the first segmentation image; the second segmentation network model adopts FRN normalization and a mean square error loss function, input data are a picture (a second image) and an original picture (a first image) after the size is changed, and when the batch number is 1, the network can be rapidly converged; the loss function is selected as the mean square error, and the picture segmentation problem can be regarded as a regression problem and can segment small details in the picture; performing joint training by combining a second segmentation network model which is normalized by FRN normalization; thus, the segmentation result can be more refined, and the high-definition large image can be more finely segmented.

The embodiment can reduce the calculation amount, avoid the generation of block phenomenon and improve the operation speed.

The embodiment can be used for making interesting application, can increase the selling points of products, and can increase the segmentation precision of images.

With the above embodiment, the practical application includes steps S1 to S6; the split network (or model) test includes steps S1 to S3; training of the segmentation network, on the basis of steps S1 to S3, the MSE loss function is used to calculate the labels and the fine segmentation result O _h To update the overall network parameters.

Fig. 10 shows a block diagram of a structure of an image processing apparatus based on guided filtering provided in an embodiment of the present application, corresponding to the method described in the above embodiment, and only shows portions related to the embodiment of the present application for convenience of description.

Referring to fig. 10, the image processing apparatus based on guided filtering is specifically an image segmentation apparatus based on guided filtering, and the apparatus includes an image acquisition unit 1, a first segmentation network model 2, a second segmentation network model 3, a background template processing unit 4, a color matching unit 5, an image fusion unit 6, and a training unit 7. Wherein, the background template processing unit 4, the toning unit 5 and the image fusion unit 6 are used together as an image fusion model.

The image acquisition unit 1 is configured to implement the aforementioned step S1; the first segmentation network model 2 is used to implement the aforementioned step S2; the second segmentation network model 3 is used to implement the aforementioned step S3; the background template processing unit 4 is configured to implement the aforementioned step S4; the toning unit 5 is used to implement the aforementioned step S5; the image fusion unit 6 is configured to implement the foregoing step S6; the training unit 7 is used to implement the aforementioned training process.

Referring to fig. 11, the first segmentation network model 2 includes a volume block lower structure 21, a feature network 22, and a volume block upper structure 23. The winding block lower structure 21 is used to realize the aforementioned step S21; the feature network 22 is configured to implement the foregoing step S22; the volume block upper structure 23 is used to realize the aforementioned step S23.

The second segmentation network model 3 is a learnable guided filtering fine segmentation model and can perform convolution operation, void convolution operation, point-by-point convolution operation, upsampling and linear operation.

It should be noted that, for the information interaction, execution process, and other contents between the above-mentioned devices/units, the specific functions and technical effects thereof are based on the same concept as those of the embodiment of the method of the present application, and specific reference may be made to the part of the embodiment of the method, which is not described herein again.

Fig. 12 is a schematic structural diagram of a computing device according to an embodiment of the present application. Referring to fig. 12, the terminal device 12 of this embodiment includes: at least one processor 120 (only one shown in fig. 12), a memory 121, and a computer program 122 stored in the memory 121 and executable on the at least one processor 120; the steps in any of the various embodiments of the learnable guided filtering based image processing method described above are implemented when the computer program 122 is executed by the processor 120.

The terminal device 12 may be a desktop computer, a notebook, a palm computer, a cloud server, or other computing devices. The computing device may include, but is not limited to, a processor 120 and a memory 121. Those skilled in the art will appreciate that fig. 12 is merely an example of a terminal device 12 and does not constitute a limitation of terminal device 12 and may include more or fewer components than shown, or some components may be combined, or different components, such as input output devices, network access devices, buses, etc.

The Processor 120 may be a Central Processing Unit (CPU), and the Processor 120 may also be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), off-the-shelf Programmable Gate arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The memory 121 may be an internal storage unit of the terminal device 12 in some embodiments, such as a hard disk or a memory of the terminal device 12. The memory 121 may also be an external storage device of the terminal device 12 in other embodiments, such as a plug-in hard disk provided on the terminal device 12, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like. Further, the memory 121 may also include both an internal storage unit of the terminal device 12 and an external storage device. The memory 121 is used for storing an operating system, an application program, a Boot Loader (Boot Loader), data, and other programs, such as program codes of a computer program. The memory 121 may also be used to temporarily store data that has been output or is to be output.

Illustratively, the computer program 122 may be divided into one or more modules/units, which are stored in the memory 121 and executed by the processor 120 to accomplish the present embodiment. One or more of the modules/units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 122 in the terminal device 12.

Tests and experimental verification were performed on the above examples. Wherein the computing device (machine test platform) is used in a configuration that: i5-8250U processor, 16G memory, GTX 10504G video card and windows10 system.

Fig. 19(a) is another example of the first image (original image); fig. 19(b) is an exemplary diagram of a second segmentation image (fine segmentation result) corresponding to fig. 19 (a); fig. 20(a) is another example of the third image (a sky-changed picture) corresponding to fig. 19 (a); fig. 20(b) is an exemplary diagram of the first divided image (coarse division result) corresponding to fig. 20 (a).

Fig. 21(a) is a third example of the first image (original image); fig. 21(b) is an exemplary diagram of a second divided image corresponding to fig. 21 (a); fig. 22(a) is a third exemplary view of a third image (a sky-changed picture) corresponding to fig. 21 (a); fig. 22(b) is an exemplary diagram of the first divided image corresponding to fig. 22 (a).

Based on the effects of the pictures, the overall effect of the embodiment is excellent, and the segmentation precision is excellent.

It should be understood that, the sequence numbers of the steps in the foregoing embodiments do not imply an execution sequence, and the execution sequence of each process should be determined by its function and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application.

It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules, so as to perform all or part of the functions described above. Each functional unit and module in the embodiments may be integrated in one processing unit, or each unit may exist alone physically, or two or more units are integrated in one unit, and the integrated unit may be implemented in a form of hardware, or in a form of software functional unit. In addition, specific names of the functional units and modules are only for convenience of distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working processes of the units and modules in the system may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

The aforementioned integrated units, if implemented in the form of software functional units and sold or used as separate products, may be stored in a computer readable storage medium. Based on such understanding, all or part of the processes in the methods of the embodiments described above may be implemented by a computer program, which may be stored in a computer-readable storage medium, to instruct related hardware; the computer program may, when being executed by a processor, realize the steps of the respective method embodiments described above. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer readable medium includes: any entity or device capable of carrying computer program code to an apparatus/terminal device, recording medium, computer Memory, Read-Only Memory (ROM), Random-Access Memory (RAM), electrical carrier wave signals, telecommunications signals, and software distribution medium. Such as a usb-drive, a removable hard drive, a magnetic or optical disk, etc. In some jurisdictions, computer-readable media may not be an electrical carrier signal or a telecommunications signal in accordance with legislative and proprietary practices.

Embodiments of the present application also provide a computer-readable storage medium, which stores a computer program, and the computer program is implemented to realize the steps of the above method embodiments when executed by a processor.

Embodiments of the present application provide a computer program product, which when run on a terminal device, such as a mobile terminal, enables the mobile terminal to implement the steps of the above-described method embodiments when executed.

In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or illustrated in a certain embodiment.

Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus/device and method may be implemented in other ways. For example, the above-described apparatus/device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims

1. An image segmentation method, comprising:

inputting the second image into a first segmentation network model for processing, and outputting a first segmentation image;

2. The method of claim 1, wherein the first partitioned network model is a partitioned network model that employs an FRN normalization layer and a mean square error as a loss function; the second segmentation network is a segmentation network model adopting an FRN normalization layer and guided filtering.

3. The method of claim 1, wherein the method further comprises:

acquiring a background template and image parameters of the background template;

carrying out image toning on the first image according to the image parameters to obtain a designated image;

and determining a third image according to the second segmentation image, the background template and the designated image.

4. The method of claim 3, wherein determining a third image from the second segmented image, the background template, and the designated image comprises:

acquiring an image fusion formula;

and carrying out image fusion on the second segmentation image, the background template and the designated image by using the image fusion formula to obtain a third image.

5. The method of claim 4, wherein the image fusion formula is:

NewImg _ij ＝imgColorChange*(255-O _h )/255+SkyTemplate*O _h /255

wherein NewImge _ij For the third image imgColorChange is the designated image, O _h And the Skytemplate is the background template of the second segmentation image.

6. The method of claim 2, wherein inputting the second image into a first segmented network model for processing and outputting a first segmented image comprises:

inputting the second image into a first segmentation network model to perform convolution operation, FRN normalization operation, activation function operation and separation convolution operation to obtain image characteristics;

performing up-sampling, convolution operation, FRN normalization operation, feature map addition and activation function operation on the image features to obtain a feature network result;

and performing up-sampling, convolution operation, FRN normalization operation and loss function operation on the characteristic network result, and outputting a first segmentation image.

7. The method of claim 6, wherein inputting the second image into a first segmentation network model for convolution, FRN normalization, activation function, and separation convolution operations to obtain image features comprises:

inputting the second image into a first segmentation network model to perform convolution operation, FRN normalization operation and activation function operation to obtain a first normalization result;

performing separation convolution operation, FRN normalization operation and activation function operation on the first normalization result to obtain a second normalization result;

and carrying out convolution operation and FRN normalization operation on the second normalization result to obtain image characteristics.

8. The method of claim 7, wherein the upsampling, convolving, FRN normalization, feature map addition, and activation function operations on the image features to obtain feature network results comprises:

performing up-sampling, convolution operation, FRN normalization operation, feature map addition, activation function operation and separation convolution operation on the image features to obtain a preliminary feature network result;

and performing up-sampling, convolution operation, FRN normalization operation, feature map addition and activation function operation on the preliminary feature network result to obtain a feature network result.

9. The method of claim 8, wherein the upsampling, convolving, FRN normalizing, and lossy function operations on the feature network result to output a first segmented image comprises:

the characteristic network result is up-sampled to obtain an up-sampling result;

performing convolution operation and FRN normalization operation on the up-sampling result to obtain a convolution normalization result;

performing convolution operation on the convolution normalization result to obtain a convolution result;

and carrying out classification operation on the convolution result and outputting a first segmentation image.

10. The method of any of claims 6-9, wherein said inputting said first image, said second image, and said first segmented image into a second segmentation network model for processing, outputting a second segmented image having a higher image segmentation accuracy than said first segmented image, comprises:

inputting the first image into a second segmentation network model for convolution operation to obtain a first feature map with the same size as the first image;

performing convolution operation on the second image to obtain a second feature map with the same size as the second image;

performing a hole convolution operation on the first segmentation image to obtain a first hole convolution result;

performing hole convolution operation on the second characteristic diagram to obtain a second hole convolution result;

performing point-by-point convolution operation on the first void convolution result to obtain a first guide filtering key parameter;

performing point-by-point convolution operation on the second cavity convolution result to obtain a second guide filtering key parameter;

performing upsampling on the first guide filtering key parameter to obtain a third guide filtering key parameter;

performing upsampling on the second guide filtering key parameter to obtain a fourth guide filtering key parameter;

and determining a second segmentation image with higher image segmentation precision than the first segmentation image according to the first feature map, the third guide filtering key parameter and the fourth guide filtering key parameter.

11. An image segmentation apparatus, comprising:

12. A terminal device, characterized in that the terminal device comprises a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the method according to any of claims 1 to 10 when executing the computer program.

13. A computer-readable storage medium, characterized in that the computer-readable storage medium stores a computer program which, when executed by a processor, implements the method of any one of claims 1 to 10.