CN111047604B

CN111047604B - Transparency mask extraction method and device for high-definition image and storage medium

Info

Publication number: CN111047604B
Application number: CN201911203685.9A
Authority: CN
Inventors: 冯夫健; 王林; 黄翰; 谭棉; 刘爽; 魏嘉银
Original assignee: Guizhou Minzu University
Current assignee: Guizhou Minzu University
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2023-04-28
Anticipated expiration: 2039-11-29
Also published as: CN111047604A

Abstract

The invention provides a transparency mask extraction method, a device and a storage medium of a high-definition image, wherein the method comprises the steps of marking an unknown region in the high-definition image; dividing the unknown region into a plurality of sub-regions according to the pixel information in the unknown region; converting each sub-region into nodes of a graph structure, calculating edge weights between adjacent nodes, and generating the graph structure according to each edge weight; generating a node optimization queue according to the edge weights among the nodes, determining a foreground area and a background area according to the node optimization queue, selecting pixel values, and carrying out optimal value solving on the pixel values. According to the method, the high-definition image is divided into areas through the pixel points, the divided areas are expressed in the form of nodes of the graph structure, the edge weights are calculated, the node optimization queue is obtained through the edge weights, the foreground area and the background area are rapidly determined in the node optimization queue, and the optimal foreground mask value is finally obtained, so that the calculation accuracy is high and rapid.

Description

Transparency mask extraction method and device for high-definition image and storage medium

Technical Field

The invention mainly relates to the technical field of image processing, in particular to a transparency mask extraction method and device for a high-definition image and a storage medium.

Background

At present, mobile equipment such as mobile phones and cameras are higher and higher in resolution of photographed images, the transparency mask extraction technology of high-resolution high-definition images is mainly applied to film and television special effects, different foreground targets are synthesized into a specified scene, the higher the extraction precision is, the better the visual effect of image synthesis is, and the problems of overlong calculation time and low calculation precision exist in the conventional method for extracting the transparency mask of high-definition images at present.

Disclosure of Invention

The invention aims to solve the technical problem of providing a transparency mask extraction method, a device and a storage medium for a high-definition image aiming at the defects of the prior art.

The technical scheme for solving the technical problems is as follows: a transparency mask extraction method of a high-definition image comprises the following steps:

inputting a high-definition image, and marking an unknown region, a foreground region and a background region in the high-definition image;

dividing the unknown region into a plurality of sub-regions according to pixel information in the unknown region;

converting each sub-region into nodes of a graph structure, calculating edge weights between adjacent nodes, and generating the graph structure according to each edge weight;

generating a node optimization queue according to edge weights among nodes, selecting pixel values among the plurality of subareas, the foreground area and the background area, carrying out optimal value solving on the selected pixel values according to the node optimization queue, and taking the optimal value obtained by solving as an optimal foreground mask value.

The other technical scheme for solving the technical problems is as follows: a transparency mask extraction apparatus of a high definition image, comprising:

the calibration module is used for inputting a high-definition image and calibrating an unknown region, a foreground region and a background region in the high-definition image;

the region segmentation module is used for segmenting the unknown region into a plurality of sub-regions according to the pixel information in the unknown region;

the graph structure generation module is used for converting each sub-region into nodes of the graph structure, calculating edge weights between adjacent nodes and generating the graph structure according to each edge weight;

and the optimization module is used for generating a node optimization queue according to the edge weights among the nodes, selecting pixel values from the plurality of subareas, the foreground area and the background area, carrying out optimal value solving on the selected pixel values according to the node optimization queue, and taking the optimal value obtained by solving as an optimal foreground mask value.

The other technical scheme for solving the technical problems is as follows: a transparency mask extraction apparatus for a high definition image, comprising a memory, a processor and a computer program stored in the memory and executable on the processor, which when executed by the processor implements a transparency mask extraction method for a high definition image as described above.

The other technical scheme for solving the technical problems is as follows: a computer readable storage medium storing a computer program which, when executed by a processor, implements a transparency mask extraction method of a high definition image as described above.

The beneficial effects of the invention are as follows: and carrying out region division on the high-definition image through pixel points, expressing the divided regions in the form of nodes of a graph structure, calculating edge weights, obtaining a node optimization queue through the edge weights, rapidly determining a foreground region and a background region in the node optimization queue, and carrying out optimization solution on pixel values in the region to finally obtain an optimal foreground mask value, wherein the calculation accuracy is high and rapid.

Drawings

Fig. 1 is a flowchart illustrating a transparency mask extraction method for a high-definition image according to an embodiment of the present invention;

fig. 2 is a schematic functional block diagram of a transparency mask extracting device for a high-definition image according to an embodiment of the present invention;

fig. 3 is a schematic node diagram of a graph structure according to an embodiment of the present invention.

Detailed Description

The principles and features of the present invention are described below with reference to the drawings, the examples are illustrated for the purpose of illustrating the invention and are not to be construed as limiting the scope of the invention.

Fig. 1 is a flowchart illustrating a transparency mask extraction method for a high-definition image according to an embodiment of the present invention.

As shown in fig. 1, a transparency mask extraction method for a high-definition image includes the following steps:

inputting a high-definition image, and marking an unknown region in the high-definition image;

Specifically, the node optimization queue is generated by a minimum spanning tree method and edge weights.

It should be understood that a High Definition image refers to a High resolution image, and a High resolution image (High Definition) refers to an image having a vertical resolution of 720p or more.

It should be understood that the identification of the unknown region, the foreground region and the background region in the high-definition image is specifically that the target texture edge of the high-definition image is expanded by a preset template, the expanded region is used as the unknown region, the target region except the expanded region is used as the background region, and the other regions except the target region are used as the foreground region.

In the above embodiment, the high-definition image is divided into the regions by the pixel points, the divided regions are expressed in the form of nodes of the graph structure, the edge weights are calculated, the node optimization queue is obtained by the edge weights, the foreground region and the background region are rapidly determined in the node optimization queue, and the pixel values in the regions are optimally solved, so that the optimal foreground mask value is finally obtained, and the calculation accuracy is high and rapid.

Optionally, as an embodiment of the present invention, the process of dividing the unknown region into a plurality of sub-regions according to pixel information in the unknown region includes:

let the ith pixel point on the unknown region be p _i I=1, 2, n, n is a positive integer;

calculating a mean shift vector m (p) corresponding to the pixel according to a pixel mean shift calculation formula and each pixel information on the unknown region _i ) The pixel mean shift calculation formula is as follows:

wherein any one pixel p _i Is composed of five dimensions, each of which represents { R, G, B, x, y }, R, G and B represent a pixel point p _i The coordinates of the colors of (2) in RGB space, x and y representing the pixel point p _i Plane coordinates on the high definition image,

h represents the bandwidth of the device, and h is more than 0, I ₂ Representing the formula distance;

calculation of p _i Mean shift vector m (p) _i ) Until the five-dimensional data points converge, so that each point reaches the maximum local density;

dividing the mean shift vectors corresponding to the n calculated pixel points into w classes, wherein the Euclidean distance of any two pixel points in each class on a five-dimensional space is smaller than the bandwidth h;

merging classes with the pixel number smaller than a preset pixel number threshold M into adjacent classes to generate w' classes, wherein each class represents a sub-region.

It should be understood that a five-dimensional space represents five dimensions, namely { R, G, B, x, y }.

In the above embodiment, the local density is obtained by calculating the color of each pixel and the distance between each pixel, the euclidean distance of any two pixel points in each class on the five-dimensional space is determined according to the density, and the classes are combined according to the number of the pixel points in the class, so as to obtain a plurality of classes, wherein each class represents a sub-region.

Optionally, as an embodiment of the present invention, the calculating edge weights between adjacent nodes, and generating the graph structure according to each edge weight includes:

defining edge weights b between nodes _i,j The definition is as follows:

wherein ,C_i For the color information of the pixel point,

S _i is the plane coordinate information of the pixel points,

X _wi ' RGB three-dimensional color information representing midpoint pixel of the i-th type region,/or->

Mean value of three-dimensional color information representing midpoint pixels of each class area, < >>

Variance value, X of RGB three-dimensional color information representing midpoint pixel of each class region _Si ' plane coordinate information representing midpoint pixel of the i-th type region, ">

Average value of plane coordinate information representing midpoint pixels of each class area, < >>

Variance value of plane coordinate information representing midpoint pixel of each class area, +.>

According to the edge weight b between the node and each node _i,j A graph structure is generated.

The midpoint pixel is defined as having

wherein ,/>

N _i Representing the number of pixels in the i-th class of region, Ω _i A set of pixels representing a region of class i, < >>

Plane x coordinate value representing midpoint pixel in the i-th type region,/->

Plane y coordinate value, x representing midpoint pixel in class i region _j Plane x coordinate value, y representing the j-th pixel of the i-th type region _j A plane y-coordinate value representing a j-th pixel of the i-th type region.

The process of converting each sub-region into the node of the graph structure is as follows: the sub-regions are numbered, each of which is denoted as a node of the graph structure.

Specifically, before defining the edge weights, each node needs to be labeled, as shown in fig. 3, different areas of the original image unknown area U corresponding to the w 'classes that are finally generated are labeled, where each label represents a segmented area (w' areas in total), and each label is a node of the graph structure.

In the above embodiment, each divided region is converted into the graph node representation, and the edge weight relationship is defined between the regions through the color and the distance, so that the node optimization queue can be conveniently generated.

Optionally, as an embodiment of the present invention, the process of determining the foreground area and the background area according to the node optimization queue includes:

optimizing a sub-region, a foreground region and a background region corresponding to an ith node of a node optimization queue to obtain an optimal value of the ith node, and taking the optimal value as an optimal foreground mask value of the ith node;

taking the optimal foreground mask value of the ith node as initial solution information of the (i+1) th node area, and optimizing the foreground area and the background area of the (i+1) th node according to the initial solution information to obtain the optimal foreground mask value of the (i+1) th node;

and optimizing all node areas in the node optimization queue until the optimization is completed, and obtaining the optimal foreground mask value of the whole unknown area.

In the above embodiment, by optimizing each node in the node optimization queue, the optimal foreground mask value of the whole unknown region is obtained, so that the extracted transparency mask result is more accurate.

Optionally, as an embodiment of the present invention, the optimizing the sub-area, the foreground area and the background area corresponding to the ith node in the node optimizing queue to obtain the optimal value of the ith node includes:

calculating each pixel in the foreground region and the background region corresponding to the ith node according to a pixel calculation formula, wherein the pixel calculation formula is that

wherein ,

represents the color value of the kth unknown pixel in the unknown region,

represents the kth background value selected in the background area,/->

Representing a kth foreground value selected in the foreground region;

taking all pixels in the foreground region and the background region corresponding to the ith node as an optimization variable X, randomly selecting pixel values from the foreground region and the background region, and assigning values to the optimization variable X according to the selected pixel values to obtain a solution set P= (X) ₁ ,X ₂ ,…,X _N ) N represents the number of solutions,

evaluating each solution in the solution set P to obtain an optimal value of the ith node, wherein the evaluation process is as follows:

if f (X) _i )＞f(X _j ) X is then _j To X direction _i The learning process comprises the following steps: according to the learning formula X _j ＝X _j +λ(X _i -X _j ) Study, X _i And continuing to compare with the next solution in the solution set P to obtain comparison error values, and stopping comparing until the comparison error values of the N solutions are smaller than a preset error value to obtain an optimal solution, wherein the optimal solution is used as the optimal value of the ith node.

In the above embodiment, the solution values of the pixel values are calculated, and the solution values are compared to obtain the error value, and the optimal value is obtained by comparing the calculated error value with the preset error value, so that a more accurate foreground mask value can be obtained.

Fig. 2 is a functional block diagram of a transparency mask extraction device for high-definition images according to an embodiment of the present invention.

Alternatively, as another embodiment of the present invention, as shown in fig. 2, a transparency mask extraction apparatus for a high definition image includes:

and the optimization module is used for generating a node optimization queue according to the edge weights among the nodes, selecting pixel values from the plurality of subareas, the foreground area and the background area, solving the optimal values of the selected pixel values according to the node optimization queue, and taking the optimal values obtained by solving as optimal foreground mask values.

Optionally, as an embodiment of the present invention, the area dividing module is specifically configured to:

Optionally, as an embodiment of the present invention, the graph structure generating module is specifically configured to:

defining edge weights b between nodes _i,j The definition is as follows:

wherein ,C_i For the color information of the pixel point,

S _i is the plane coordinate information of the pixel points,

X _w′i RGB three-dimensional color information representing midpoint pixel of i-th class region,/and method for generating the same>

Representing various class regionsAverage value of plane coordinate information of midpoint pixel of domain,/->

A variance value representing plane coordinate information of a midpoint pixel of each class region,

Alternatively, as another embodiment of the present invention, a transparency mask extraction apparatus for a high definition image includes a memory, a processor, and a computer program stored in the memory and executable on the processor, which when executed by the processor, implements the transparency mask extraction method for a high definition image as described above.

Alternatively, as an embodiment of the present invention, a computer-readable storage medium storing a computer program which, when executed by a processor, implements the transparency mask extraction method of a high-definition image as described above.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.

Claims

1. A method for extracting a transparency mask of a high definition image, comprising the steps of:

generating a node optimization queue according to edge weights among nodes, selecting pixel values from the plurality of subareas, the foreground area and the background area, carrying out optimal value solving on the selected pixel values according to the node optimization queue, and taking the optimal value obtained by solving as an optimal foreground mask value;

the process for carrying out optimal value solving on the selected pixel values according to the node optimization queue comprises the following steps:

optimizing a sub-region, a foreground region and a background region corresponding to an ith node of the node optimization queue to obtain an optimal value of the ith node, and taking the optimal value as an optimal foreground mask value of the ith node;

optimizing all node areas in the node optimization queue until the node optimization queue is completed, and obtaining an optimal foreground mask value of the whole unknown area;

the process of optimizing the sub-region, the foreground region and the background region corresponding to the ith node of the node optimization queue to obtain the optimal value of the ith node comprises the following steps:

wherein ,

representing the color value of the kth unknown pixel in the unknown region,/for>

Represents the kth background value selected in the background area,/->

Representing a kth foreground value selected in the foreground region;

taking all pixels in the foreground region and the background region corresponding to the ith node as an optimization variable X, randomly selecting pixel values from the foreground region and the background region, and assigning values to the optimization variable X according to the selected pixel values to obtain a solution set P= (X) ₁ ,X ₂ ,……,X _N ) N represents the number of solutions;

2. The transparency mask extraction method of high definition image according to claim 1, wherein the process of dividing the unknown region into a plurality of sub-regions according to pixel information in the unknown region comprises:

let the ith pixel point on the unknown region be p _i ，i＝1, 2..the term "n, n is a positive integer;

h represents the bandwidth, and h > 0, ₂ representing the Euclidean distance;

3. The method for extracting the transparency mask of the high-definition image according to claim 1, wherein the process of calculating the edge weights between the adjacent nodes and generating the graph structure according to the respective edge weights comprises:

defining edge weights b between nodes _i,j The definition is as follows:

wherein ,C_i For the color information of the pixel point,

S _i is pixel point plane coordinate information, +.>

RGB three-dimensional color information representing midpoint pixel of i-th class region,/and method for generating the same>

The variance values of the RGB three-dimensional color information representing the midpoint pixels of each class area,

plane coordinate information representing midpoint pixel of the i-th type region,/th type region>

4. A transparency mask extraction apparatus for a high definition image, comprising:

the optimization module is used for generating a node optimization queue according to the edge weights among the nodes, selecting pixel values from the plurality of subareas, the foreground area and the background area, carrying out optimal value solving on the selected pixel values according to the node optimization queue, and taking the optimal value obtained by solving as an optimal foreground mask value;

in the optimization module, the process of carrying out optimal value solving on the selected pixel value according to the node optimization queue comprises the following steps:

wherein ,

Represents the kth background value selected in the background area,/->

Representing a kth foreground value selected in the foreground region;

taking all pixels in the foreground region and the background region corresponding to the ith node as an optimization variable X, randomly selecting pixel values from the foreground region and the background region, and assigning values to the optimization variable X according to the selected pixel values to obtain a solution set P= (X) ₁ ,X ₂ ,,X _N ) N represents the number of solutions;

5. The transparency mask extraction device of high definition image according to claim 4, wherein the region segmentation module is specifically configured to:

calculating a mean shift vector m (p) corresponding to the pixel according to a pixel mean shift calculation formula and each pixel information on the unknown region _i ) The sum ofThe pixel mean shift calculation formula is:

h represents the bandwidth of the device, and h is more than 0, I ₂ Representing the Euclidean distance; />

6. The transparency mask extraction device of high definition image according to claim 4, wherein the graph structure generation module is specifically configured to:

defining edge weights b between nodes _i,j The definition is as follows:

wherein ,C_i For the color information of the pixel point,

S _i is pixel point plane coordinate information, +.>

Mean value of three-dimensional color information representing midpoint pixels in each class area, < >>

Variance values of RGB three-dimensional color information representing midpoint pixels of each class area, +.>

7. A transparency mask extraction device for high definition images comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the transparency mask extraction method for high definition images according to any one of claims 1 to 3 is implemented when the computer program is executed by the processor.

8. A computer readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the transparency mask extraction method of a high definition image according to any one of claims 1 to 3.