CN113052755A

CN113052755A - High-resolution image intelligent matting method based on deep learning

Info

Publication number: CN113052755A
Application number: CN201911382539.7A
Authority: CN
Inventors: 赫高峰; 张爽爽; 张王晟
Original assignee: Hangzhou Shenhui Intelligent Technology Co ltd
Current assignee: Hangzhou Shenhui Intelligent Technology Co ltd
Priority date: 2019-12-27
Filing date: 2019-12-27
Publication date: 2021-06-29

Abstract

The invention discloses a high-resolution image intelligent matting method based on deep learning, which comprises the following steps: (a) giving at least one input image, constructing a segmentation convolutional neural network based on deep learning to segment a target main body region, namely a foreground image; (b) generating a trimap image containing a target main body area, a non-target main body area and an unknown area by utilizing an image processing technology on the foreground image obtained by segmentation; (c) constructing a cutout convolution neural network, and adopting the input image and the obtained trisection image as cutout network input to analyze and obtain a transparent image of a target main body in the image; (d) and further carrying out post-processing on the transparent image to obtain a target main body transparent image, and combining the original input image to realize a final matting result. Based on the deep learning technology, the high-resolution images are given, extra manual assistance is not needed in the whole process, and automatic intelligent cutout of the target main body in the fashion images is achieved.

Description

High-resolution image intelligent matting method based on deep learning

Technical Field

The invention relates to the field of image processing, in particular to a high-resolution image intelligent matting method based on deep learning.

Background

In the digital era, the image data is explosively accumulated in the rapid growth of unstructured images, particularly in new retail industries such as fashion-related clothing and the like by means of modern social platforms such as amazon, tianmao, Tabao and the like, and microblogs, instagrams and the like. For images containing complicated visual information, it is very challenging to extract the target subject region of interest with high precision, especially from high resolution images.

For the fashion field, the e-commerce new retail field and the like, a large amount of manual processing is often required to be carried out on images to produce processing effects with more aesthetic property and target body prominence, and the most critical and time-consuming method is high-precision picking, namely picking, of the fashion target body, such as clothes, models, cases, shoes and caps and the like. The high-precision cutout at the present stage is generally completed by fully manually using Adobe Photoshop (PS) software. In addition, interactive matting techniques are also used in some general scenarios, such as matting applications where accuracy requirements are not high. With the increasing demand of intelligent matting and the development of artificial intelligence technology, the image intelligent matting method based on deep learning comes along.

Full-manual matting by Adobe Photoshop is the most common method in the field of target body matting, and the precision is high due to manual completion, especially operation by people with professional backgrounds; but the treatment process is more complicated and time-consuming; on the other hand, the requirement for users is higher, and for non-professional personnel without matting knowledge, manual matting with higher precision is difficult to complete in most cases.

The interactive cutout improves the cutout efficiency to a certain extent and simplifies the operation difficulty of partial cutout by simple manual assistance, such as manual mouse designation of a target main body region and a background region; however, the method usually needs additional manual assistance by means of a mouse and the like, and is only suitable for scenes with low precision requirements; when the target main body area is close to the non-target area, the matting effect is poor; in addition, for the image with a slightly complex non-target area, the matting difficulty and complexity are obviously increased, the matting effect is greatly reduced, and the high-precision matting of the image with high resolution is difficult to meet.

At present, the matting technology based on deep learning is often only suitable for low-resolution image matting in certain fields, and the precision is still to be improved; for high-resolution image processing, hardware environment and processing time consumption are obviously higher, and for fashion target subjects such as clothes, models, bags, shoes and hats, the problems of insufficient detail processing, rough and fuzzy edges and the like exist.

Disclosure of Invention

The invention aims to solve the technical problem of providing a high-resolution image intelligent matting method based on deep learning, and the method is based on the deep learning technology, gives fashion images such as high-resolution clothes, models, bags, shoes and hats, and the like, does not need additional manual assistance in the whole process, and realizes automatic intelligent matting of target bodies in the fashion images.

In order to solve the technical problems, the technical scheme of the invention is as follows:

a high-resolution image intelligent matting method based on deep learning comprises the following steps:

(a) giving at least one input image, constructing a segmentation convolutional neural network based on deep learning to segment a target main body region, namely a foreground image;

(b) generating a trimap image containing a target main body area, a non-target main body area and an unknown area by utilizing an image processing technology on the foreground image obtained by segmentation;

(c) constructing a cutout convolution neural network, and adopting the input image and the obtained trisection image as cutout network input to analyze and obtain a transparent image of a target main body in the image;

(d) and further carrying out post-processing on the transparent image to obtain a target main body transparent image, and combining the original input image to realize a final matting result.

Preferably, the segmented convolutional neural network in step (a) includes a segmented network training phase and a segmented network testing phase.

Preferably, the step (a) further comprises the steps of:

(a1) and based on the high-resolution image data set and the corresponding artificially labeled labeling data set, sending the training data into the constructed segmentation convolutional neural network to analyze and segment to obtain a target main body region.

Preferably, the method further comprises the steps of:

(a2) the objective loss function adopted by the training of the segmented convolutional neural network is mean square error, and the specific formula is as follows:

wherein the content of the first and second substances,

is the pixel value of the image predicted by the segmentation network at the ith pixel point,

is the pixel value of the real image marked in the training data set at the ith pixel point, and the threshold value range of the real image are both [0,1]]In the meantime.

Preferably, the method further comprises the steps of:

(a3) in the segmented convolutional neural network, an image input of an arbitrary size is supported.

Preferably, the matting convolutional neural network in step (c) includes a matting convolutional network training phase and a testing phase.

Preferably, the matting convolutional neural network in step (c) is mainly composed of a binary deep convolutional neural network and a fine convolutional neural network.

Preferably, the method further comprises the following steps in step (c):

(c1) after the matting convolutional neural network is processed, a target main body transparent image of the image is obtained through prediction, a loss function between the target main body transparent image and marked matting training data is calculated, and optimization gradient calculation is carried out.

Preferably, the method further comprises the following steps in step (c):

(c11) the loss function adopts a binary cross entropy loss function, and the formula is as follows:

wherein the content of the first and second substances,

is the pixel value of the image predicted by the matting convolutional neural network at the ith pixel point,

is the pixel value of the marked sectional image data at the ith pixel point, and the threshold value ranges of the marked sectional image data and the marked sectional image data are both [0,1]]An interval.

Preferably, the method further comprises the following steps in step (d):

(d1) smoothing the original image by using a Gaussian filter, and controlling parameters such as a filtering radius and the like;

(d2) forming a new high-pass filtering image by using the difference between the original image and the image after filtering smoothing;

(d3) setting a cutting threshold value, and discarding a certain proportion of high-pass filtering images;

(d4) and adding a certain proportion of high-pass filtering images to the original image to obtain a final result.

By adopting the technical scheme, the intelligent image matting technology is adopted, so that the invention has the beneficial effects that:

1: fine matting of the high-resolution image;

2: the method is suitable for fashionable and commodity drawings such as clothes, models, bags, shoes, hats and the like;

3: the whole process of image matting is completely intelligentized and automated;

4: the structural design of the whole intelligent sectional drawing.

Drawings

FIG. 1 is a schematic diagram of intelligent matting of high resolution fashion images based on deep learning in the present invention;

FIG. 2 is a schematic diagram of a high resolution image segmentation network model training technique according to the present invention;

FIG. 3 is a schematic diagram of a high resolution image segmentation network model testing technique according to the present invention;

FIG. 4 is a schematic diagram of a high resolution image matting network model training technique in the present invention;

FIG. 5 is a schematic diagram of a high resolution image matting network model testing technique in the present invention;

FIG. 6 is a flow chart of the transparent image post-processing of the target subject in the present invention.

Detailed Description

The following further describes embodiments of the present invention with reference to the drawings. It should be noted that the description of the embodiments is provided to help understanding of the present invention, but the present invention is not limited thereto. In addition, the technical features involved in the embodiments of the present invention described below may be combined with each other as long as they do not conflict with each other.

As shown in fig. 1, the present invention provides an implementation principle, which is as follows: given input images, a depth learning technology is adopted, firstly, a depth segmentation convolutional neural network is constructed to segment target main body regions, and preliminary pixel-level prediction is carried out to obtain the target main body regions in the images, namely foreground images; then, generating a Trimap image (Trimap) comprising a target main body (foreground) area, a non-target main body (background) area and an unknown area by utilizing an image processing technology on the foreground image obtained by segmentation; then, constructing a cutout convolution neural network, and adopting the input image and the obtained trisection image as cutout network input to analyze and obtain a transparent image (Alpha) of a target main body in the image; and finally, further post-processing the transparent image (Alpha), such as detail lifting, edge smoothing, sharpening and the like, to obtain a target main body transparent image (Alpha), and combining the original input image to realize a final matting result. In addition, the matting result supports automatic conversion, storage and export of formats such as PNG and PSD.

According to the above implementation principle, the present invention provides a specific embodiment, for the image I to be scratched, it is mainly composed of the linear combination of the foreground F, the background B and the transparency Alpha, that is:

I＝αF+(1-α)B，α∈[0,1]。

based on the mathematical representation of the matting problem, the high-resolution image is divided into an image segmentation technique and a matting technique, wherein the image segmentation technique comprises a segmentation network training stage and a segmentation network testing stage.

As shown in fig. 2, in the segmentation network training stage, based on the high-resolution image data set and the corresponding artificially labeled data set, the training data is sent to the constructed segmentation convolutional neural network to analyze and segment to obtain the target main body region (foreground map). The optimization goal is to calculate the degree of fit between the predicted foreground map and the labeled target foreground map. A two-class deep convolutional neural network model of a target subject (foreground) + a non-target subject (background) is adopted, which is mainly based on an encoder-decoder (encoder + decoder) network structure and a perforated spatial pyramid (aspp), and a Sigmoid layer is adopted as an activation function of a last convolutional layer. The Sigmoid activation function is represented as:

where y is the input variable. The Sigmoid activation function layer can map the variable value range of the network forward calculation to [0,1], and has the advantages of smoothness and easy derivation.

The objective loss function employed for the training of the segmented convolutional neural network is mean-square error (mse):

wherein the content of the first and second substances,

is the pixel value of the real image marked in the training data set at the ith pixel point, and the threshold value range of the real image are both [0,1]]In the meantime. Epsilon is a small number and is used for avoiding numerical divergence in the process of calculating gradient by target optimization back propagation so as to avoid network training errors, and is set to be 10^-12. Gradient of the loss function

The calculation is as follows:

as shown in fig. 3, in the segmentation network testing stage, for a given input image, semantic analysis is performed by using a segmentation network model obtained by training, so that a target main body region (foreground image) in the image can be obtained. The segmentation neural network adopted by the technical scheme supports image input of any size, particularly can output a prediction result corresponding to an input image aiming at the input of a high-resolution image, effectively reduces image information loss, and has better processing effects on details, edges and the like.

Further, a matting convolutional network training stage and a network testing stage are respectively included in the matting technology:

as shown in fig. 4, in the training stage of the matting convolution network, based on a training image data set and an annotation data set, a segmentation network model is first used to generate a corresponding image foreground map, and an image processing method is used to generate a Trimap (Trimap) from the foreground map. Then, the training image data set and the trimap image data set are used together as the input of the matting and convolution neural network. The matting convolutional neural network mainly comprises a two-class deep convolutional neural network and a fine tuning convolutional neural network. Similar to the split convolutional network, the two-class convolutional neural network and the fine convolutional neural network both use Sigmoid functions as the activation function layers of the last convolutional layer. After the matting convolution network processing, predicting to obtain a target main body transparent image (Alpha) of the image, calculating a loss function between the target main body transparent image and the marked matting training data, and performing optimization gradient calculation. Wherein, the target Loss function adopts a Binary Cross Entropy Loss function (Binary Cross Entropy Loss):

wherein the content of the first and second substances,

As shown in fig. 5, in the testing stage of the matting convolution network, an input image is given, and after a target region (foreground image) is obtained by segmentation using a trained segmentation convolution network model, a Trimap is obtained by using an image processing method; then, the input image and the trigram thereof are used as the input of the cutout convolution network, and the cutout model obtained by training is adopted to carry out prediction analysis, so that the target main body transparent image (Alpha) of the input image can be obtained. At this time, the cutout requirement of most of the target body can be actually completed, the target body region can be accurately identified, and a good result is obtained.

Further, as shown in fig. 6, after the target subject transparency map is obtained, a post-processing flow is provided. Although the input image is processed by the segmentation convolution model and the cutout convolution model, a high-precision transparent cutout result can be obtained; however, in a few cases, certain flaws may occur, and in order to further improve the transparency cutout effect, details, edges and the like are optimized by combining with a conventional image processing algorithm, such as an unsharp mask algorithm (UnsharpMask algorithm), a gaussian blur (gaussian), and the like, so that the problems of white edges, rough edges, blurred details and the like in some scenes are effectively solved. The unsharp mask algorithm greatly changes the visual effect of the transparent image by enhancing the content of the high-frequency part of the transparent image, and mainly comprises the following processes:

1) smoothing the original image by using a Gaussian filter, and controlling parameters such as a filtering radius and the like;

2) forming a new high-pass filtering image by using the difference between the original image and the image after filtering smoothing;

3) setting a cutting threshold value, and discarding a certain proportion of high-pass filtering images;

4) and adding a certain proportion of high-pass filtering images to the original image to obtain a final result.

And finally, performing cutout fusion on the post-processed transparent image and the input image to obtain a final cutout result. In addition, the intelligent matting technology supports the export and storage of results in various output formats, such as PNG, PSD and the like.

According to the operation flow of the embodiment, the high-precision matting of the target body region can be quickly realized by using the high-resolution fashion image intelligent matting technology based on deep learning, the whole process only needs to give the image to be processed, no manual intervention and interaction are needed, and the operation is very simple. And the matting effect of fashion object main body regions such as clothes, models, cases, shoes and hats is fine enough, the effect of manually adopting Adob Photoshop processing is more approximate, the accuracy of the hair pixel level can be achieved, and the application in a real scene is easier. Particularly, the processing effect of the commodity image with high resolution reaches industrial application, and the requirements of online merchants and offline new retail on various application scenes such as transparent picture requirements, poster manufacturing, screen end display and the like can be met.

The method mainly aims at realizing accurate and fine target body region sectional drawing of the high-resolution image; the matting effect is real, the details are exquisite, and the edges are smooth and sharp; the method is suitable for fashion and commodity images such as clothes, models, bags, shoes and hats and the like, and meets the industrial application requirements of practical application scenes; the method has better generalization capability and robustness, and can be rapidly extended to more fashionable and commodity-related application fields, such as automobile positioning, pet matting and the like.

According to the specific embodiment, the invention provides a high-resolution image intelligent matting method based on deep learning, which comprises the following steps:

According to the above intelligent matting method, the segmented convolutional neural network in step (a) includes a segmented network training phase and a segmented network testing phase.

Specifically, the method further comprises the following steps in step (a):

Further, the method also comprises the following steps:

wherein，

Further, the method also comprises the following steps:

According to the above-mentioned intelligent matting method, in step (c) the matting convolutional neural network comprises a matting convolutional network training stage and a testing stage.

Specifically, the matting convolutional neural network in step (c) is mainly composed of a binary deep convolutional neural network and a fine convolutional neural network.

Further, the step (c) further comprises the following steps:

wherein the content of the first and second substances,

is the pixel value of the marked sectional image data at the ith pixel point and the threshold value of the marked sectional image data and the ith pixel pointAll ranges are [0,1]]An interval.

According to the above intelligent matting method, the step (d) further comprises the following steps:

The embodiments of the present invention have been described in detail with reference to the accompanying drawings, but the present invention is not limited to the described embodiments. It will be apparent to those skilled in the art that various changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, and the scope of protection is still within the scope of the invention.

Claims

1. A high-resolution image intelligent matting method based on deep learning is characterized by comprising the following steps:

2. The intelligent matting method according to claim 1, wherein in step (a) the segmented convolutional neural network includes a segmented network training phase and a segmented network testing phase.

3. The intelligent matting method according to claim 2, further comprising in step (a) the steps of:

4. The intelligent matting method according to claim 3, further comprising the steps of:

wherein the content of the first and second substances,

5. The intelligent matting method according to any one of claims 1 to 4, characterized by further comprising the steps of:

6. The intelligent matting method according to claim 1, wherein in step (c) the matting convolutional neural network includes a matting convolutional network training stage and a testing stage.

7. The intelligent matting method according to claim 6, wherein in step (c) the matting convolutional neural network is mainly composed of a binary deep convolutional neural network and a fine convolutional neural network.

8. The intelligent matting method according to claim 7, further comprising the following steps in step (c):

9. The intelligent matting method according to claim 8, further comprising the following steps in step (c):

wherein the content of the first and second substances,

10. The intelligent matting method according to claim 1, further comprising the following steps in step (d):