WO2018107825A1 - 抠图方法及装置 - Google Patents

抠图方法及装置 Download PDF

Info

Publication number
WO2018107825A1
WO2018107825A1 PCT/CN2017/100596 CN2017100596W WO2018107825A1 WO 2018107825 A1 WO2018107825 A1 WO 2018107825A1 CN 2017100596 W CN2017100596 W CN 2017100596W WO 2018107825 A1 WO2018107825 A1 WO 2018107825A1
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
map
image
input image
sample
Prior art date
Application number
PCT/CN2017/100596
Other languages
English (en)
French (fr)
Inventor
沈小勇
贾佳亚
鲁亚东
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2018107825A1 publication Critical patent/WO2018107825A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • Embodiments of the present invention relate to the field of digital mapping, and in particular, to a method and apparatus for mapping.
  • the digital cutout technique is a technique of decomposing the digital image I into the foreground image F and the background image B.
  • the decomposition process of digital mapping technology can be expressed as:
  • is a number between 0 and 1, called the transparency value of the digital image or alpha mask
  • the alpha matrix of the digital image I is used to represent the result of the digital image I, when the alpha value
  • 1 is the representative pixel belongs to the foreground
  • the alpha value is 0, the pixel belongs to the background
  • the alpha value is between 0 and 1, the pixel belongs to the front background mixed region.
  • ⁇ , F, and B are simultaneously estimated for each pixel in the digital image, where ⁇ is single-channel data, and F and B of each pixel are RGB (Red Green Blue) three-channel data. Therefore, it is necessary to estimate seven unknowns simultaneously for each pixel in the digital image, which causes the above digital mapping technique to be a technical problem that is very difficult to solve accurately.
  • the alpha value of most pixels in the digital image is also manually calibrated by the user, which is also called a trimap.
  • the calibrated image includes: a foreground area 12 with a user-calibrated alpha value of 1, a background area 14 with a user-calibrated alpha value of 0, and a user-calibrated alpha value.
  • the unknown region 16 of the uncertainty value, the unknown region 16 is the region that the mapping algorithm needs to estimate.
  • the closed-form mating algorithm is used to estimate the foreground pixels and the background pixels in the unknown region 16 according to the foreground region 12 and the background region 14 specified by the user.
  • the alpha value of each pixel in the unknown region is also manually calibrated by the user, which is also called a trimap.
  • an embodiment of the present invention provides a method and apparatus for mapping.
  • the input image is predicted by using a full convolution network, and the predicted scores belonging to the foreground region, the predicted scores belonging to the background region, and the predicted scores belonging to the unknown region of each pixel of the input image are obtained.
  • the input image is calibrated to obtain more accurate stencil results, and a fully automatic digital map is realized.
  • the mapping method includes:
  • a full convolutional network is a neural network for predicting the region to which each pixel belongs;
  • the foreground probability matrix F and the background probability matrix B are input into a preset map realization function to obtain a transparency value matrix of the input image, and the map realization function is to use the first sample image to optimize the solution of the target equation of the map.
  • the first sample image and the input image After training by using a preset error backward propagation algorithm, the first sample image and the input image have the same preset image type, and the transparency value matrix is a matrix for mapping the input image.
  • the present invention predicts an input image by using a preset full convolution network, and obtains a predicted score belonging to a foreground region, a predicted score belonging to a background region, and a predicted score belonging to an unknown region of each pixel in the input image.
  • the user does not need to manually perform a three-value icon on the input image; at the same time, the foreground probability matrix F and the predicted score belonging to the foreground region, the predicted score belonging to the background region, and the predicted score belonging to the unknown region are used.
  • the background probability matrix B, the input map implementation function obtains the transparency value matrix, and since the map realization function is obtained by using the sample image set to be trained by using a preset backward propagation function, the sample image set includes a large number of input images with The first sample image of the same preset image type, so that the map implementation function can achieve accurate mapping results according to the foreground probability matrix F and the background probability matrix B, without requiring the user to repeatedly perform a three-value icon on the input image.
  • a fully automated implementation of the input image to transparency value matrix is a fully automated implementation of the input image to transparency value matrix.
  • the map implementation function is to use the first sample image to train the optimal solution of the target equation of the map using a preset backward propagation algorithm.
  • Obtaining comprising: acquiring a foreground probability matrix F and a background probability matrix B of the first sample image and a sample transparency value matrix; using an optimal solution of the target equation of the map as an initial map implementation function;
  • the foreground probability matrix F and the background probability matrix B are input into the map realization function to obtain a training transparency value matrix of the first sample image; according to the error between the training transparency value matrix and the sample transparency value matrix, the error backward propagation algorithm is used.
  • the parameters in the graph implementation function are corrected; the above-mentioned correction step is repeated, and when the error between the training transparency value matrix and the sample transparency value matrix is less than the preset threshold, the trained map implementation function is obtained.
  • the optional implementation method uses the error back propagation algorithm to train the map implementation function, and obtains the map implementation function with the error less than the preset threshold as the trained map implementation function, which can improve the accuracy of the digital map, and
  • the map implementation function does not rely heavily on the accuracy of the three-value icon for the input image.
  • the error back propagation algorithm is used to correct the parameters in the map implementation function, including: when the error is greater than the preset threshold and the error
  • the gradient in the gradient descent method is constructed by the following partial derivative
  • f is the map realization function
  • F is the foreground probability matrix
  • B is the background probability matrix
  • is the parameter used for training with the first sample image
  • D ⁇ B+ ⁇ F+L
  • L is the known ⁇ The Plass matrix
  • diag is a function used to construct a diagonal matrix.
  • the optional embodiment can make the training transparency value matrix can approach the sample transparency value matrix more quickly, and improve the training function of the map realizing function. Training efficiency.
  • the first possible implementation of the first aspect calculates the foreground probability matrix F and the background probability matrix B corresponding to the input image, including: inputting F s , B s and U s of each pixel in the input image into the following formula, Get F:
  • exp is an exponential function based on the natural constant e.
  • the optional implementation calculates the foreground probability matrix and the background probability matrix of the input image according to the above two formulas as input of the subsequent map implementation function, because the process is to perform F s , B s and U s of the input image.
  • the process of normalization can reduce the calculation amount of the subsequent map implementation function when performing the map, and improve the efficiency of the map.
  • the input image is input into the preset A full convolutional network, which obtains a predicted score F s belonging to the foreground region of each pixel in the input image, a predicted score B s belonging to the background region, and a predicted score U s belonging to the unknown region, including: an input image and
  • the guide graph matrix is input to the full convolution network to obtain F s , B s and U s of each pixel in the input image, and the map matrix is used to indicate that each pixel in the image belonging to the preset image type belongs to the foreground region and the background.
  • the empirical probability values of the region and the unknown region, the pilot map matrix is trained by using the map sample set in advance, and the second sample image and the input image in the map image set have the same preset image type.
  • the alternative implementation is performed by using a guide map matrix to assist the full convolution network, since the guide map matrix is trained in advance using the map sample set, wherein the second sample image and the input image in the map image set have the same
  • the preset image type improves the prediction accuracy of the full-convolution network when predicting the three-valued image of the input image.
  • the guide map matrix is trained by using the second sample image in advance, and includes:
  • M is the guide graph matrix
  • is the summation function
  • n is the number of second sample images.
  • the optional implementation manner obtains a guide map matrix according to the map sample set, and the guide map matrix is used to indicate an empirical probability value that each pixel in the image belonging to the preset image type belongs to the foreground region, the background region, and the unknown region, due to the map
  • the second sample image and the input image in the image set have the same preset image type, so the training accuracy of the guide map matrix can be improved.
  • an embodiment of the present invention provides a mapping device, where the mapping device includes at least one unit, and the at least one unit is configured to implement the foregoing first aspect or any one of the possible implementation manners of the first aspect.
  • the method of mapping is configured to implement the foregoing first aspect or any one of the possible implementation manners of the first aspect.
  • an embodiment of the present invention provides a terminal, where the terminal includes: one or more processors, and a memory, where the memory stores one or more programs, and one or more programs are configured to be configured by one or more The processor executes, and the one or more programs include instructions for implementing the mapping method as described in the first aspect.
  • an embodiment of the present invention provides a computer readable storage medium, where the computer readable storage medium stores a map for implementing the foregoing first aspect or any one of the first aspects.
  • the executable program of the method is not limited to:
  • FIG. 1 is a schematic diagram of a calibrated input image provided by an embodiment of the prior art
  • FIG. 2 is a flowchart of a method for mapping according to an embodiment of the present invention
  • FIG. 3 is a schematic diagram of a full convolution network involved in the embodiment shown in FIG. 2;
  • 4A is a flowchart of a method for mapping according to another embodiment of the present invention.
  • 4B is a schematic diagram of a method of mapping according to another embodiment of the present invention.
  • FIG. 5 is a flowchart of a method for training a learning function of a map according to an embodiment of the present invention
  • FIG. 6 is a flowchart of a training guide map matrix according to an embodiment of the present invention.
  • FIG. 7 is a block diagram of a mapping device according to an embodiment of the present invention.
  • FIG. 8 is a block diagram of a terminal according to an embodiment of the present invention.
  • FIG. 2 shows a flowchart of a method for mapping according to an embodiment of the present invention.
  • This embodiment is exemplified by applying the map method to a terminal device having image processing capability.
  • the mapping method includes the following steps:
  • Step 201 Input the input image into a preset full convolution network, and obtain a predicted score F s belonging to the foreground region, a predicted score B s belonging to the background region, and a predicted score belonging to the unknown region of each pixel in the input image.
  • the value U s The value of the input image.
  • the input image is a frame of digital image.
  • the input image is an image that includes a background area and a foreground area.
  • the image type of an input image is a half-length portrait type
  • the foreground area of the input image includes a half-length portrait.
  • the input image is a digital image using the Red Green Blue (RGB) color standard.
  • the input image includes M*N pixels, each of which is represented by three color components of RGB. It should be noted that the embodiments of the present invention are also applicable to images of black and white images or other color standards, which are not limited thereto.
  • Fully Convolutional Networks has a neural network with pixel-level classification capabilities.
  • the preset full convolution network is a neural network having three classifications for each pixel in the input image, that is, classifying each pixel into any one of a foreground region, a background region, and an unknown region.
  • the full convolution network can predict the predicted score F s belonging to the foreground region of each pixel in the input image, the predicted score B s belonging to the background region, and the predicted score U s belonging to the unknown region.
  • a full convolutional network typically includes a convolutional layer and a deconvolutional layer.
  • the convolution layer of the full convolution network is used to extract a feature map of the input image, and the deconvolution layer of the full convolution network is used to upsample the extracted feature map.
  • the full convolution network has the advantages of small model size and fast calculation speed.
  • the full convolution network includes: an input layer, at least one convolution layer (including, for example, a first convolutional layer C1, a second convolutional layer C2, and a third convolutional layer C3, a total of three convolutional layers), at least one inverse
  • the convolution layer (such as a total of three deconvolution layers including a first deconvolution layer D1, a second deconvolution layer D2, and a third deconvolution layer D3) and an output layer.
  • the input data for the input layer is the input image and the guide matrix.
  • the output result of the output layer is the predicted score F s belonging to the foreground region of each pixel in the input image, the predicted score B s belonging to the background region, and the predicted score U s belonging to the unknown region.
  • the specific structure of the convolutional layer and the deconvolution layer of the full convolutional network is not limited.
  • the above-described full convolutional network shown in FIG. 3 is merely exemplary and explanatory, and is not limited to Embodiments of the invention. In general, the more layers of a full-convolution network, the better the effect, but the longer the calculation time. In practical applications, a full-convergence network with appropriate layers can be designed in combination with the requirements for detection accuracy and efficiency. .
  • the predicted score F s belonging to the foreground region indicates the possibility that the pixel belongs to the foreground region, and the larger the predicted score Fs belonging to the foreground region, the greater the possibility that the pixel belongs to the foreground region.
  • the predicted score B s belonging to the background area indicates the possibility that the pixel belongs to the background area, and the larger the predicted score Bs belonging to the background area, the greater the possibility that the pixel belongs to the background area.
  • the prediction score U s belonging to the unknown region indicates the possibility that the pixel belongs to the unknown region, and the larger the prediction score Us belonging to the unknown region, the greater the possibility that the pixel belongs to the unknown region.
  • Step 202 Calculate a foreground probability matrix F and a background probability matrix B corresponding to the input image according to F s , B s and U s of each pixel in the input image.
  • the foreground probability matrix F is used to indicate the probability that each pixel in the input image belongs to the foreground region
  • the background probability matrix B is used to indicate the probability that each pixel in the input image belongs to the background region.
  • Step 203 Input the foreground probability matrix F and the background probability matrix B into a preset map implementation function to obtain a transparency value matrix of the input image.
  • the map realization function is a map function obtained by using the sample image set to train the optimal solution of the target equation of the map using a preset backward propagation algorithm.
  • the first sample image and the input image in the sample image set have the same preset image type, for example, the first sample image and the input image are both half-length portrait images.
  • the preset mapping implementation function is represented by the following formula:
  • f(F, B; ⁇ ) is a function for solving the transparency value ⁇ of each pixel in the input image
  • is the first
  • L is the ⁇ Map Laplacian matrix
  • the terminal device inputs the foreground probability matrix F and the background probability matrix B into a preset map implementation function to obtain a transparency value matrix of the input image.
  • the transparency value matrix is a matrix for mapping the input image.
  • the luminance value of each color component is multiplied by the transparency value at the corresponding position, ie The result of the map of the input image is available.
  • the mapping method predicts an input image by using a full convolution network, and obtains a predicted score belonging to a foreground region and a predicted score belonging to a background region of each pixel in the input image. And the predicted scores belonging to the unknown area, the user does not need to manually perform the three-value icon on the input image; the predicted scores belonging to the foreground region of each pixel, the predicted scores belonging to the background region, and the predictions belonging to the unknown region will be utilized.
  • the foreground probability matrix F and the background probability matrix B obtained by the scores, the input map implementation function obtains a transparency value matrix, and the transparency value matrix is a matrix for mapping the input image.
  • the map implementation function Since the map implementation function is obtained by training the first sample image with a preset error backward propagation function, the map implementation function does not have a serious dependence on the accuracy of the three-value icon, so the map is not heavily dependent.
  • the realization function can realize accurate mapping, and does not require the user to repeatedly perform the three-value icon setting on the input image, thereby realizing the fully automatic realization of the input image to the transparency value matrix.
  • mapping method includes the following steps:
  • Step 401 Input the input image and the guide matrix into the full convolution network to obtain F s , B s and U s of each pixel in the input image.
  • the guide map matrix is used to indicate an empirical probability value that each pixel in the image belonging to the preset image type belongs to the foreground region, the background region, and the unknown region.
  • the preset image type is an image type corresponding to the input image.
  • the guide map matrix is used to indicate that each pixel in the image belonging to the half-length portrait type belongs to the foreground region, the background region, and the unknown region.
  • the empirical probability value, the guiding map matrix can represent the empirical position of the portrait in most of the half-length portrait images; for example, if the preset image type is a full-length portrait type, the guiding map matrix is used to indicate an image belonging to the full-length portrait type.
  • Each of the pixels belongs to an empirical probability value of a foreground region, a background region, and an unknown region, and the guide map matrix can represent the empirical position of the portrait in most of the full-body portrait images.
  • the guide map matrix is trained by using the map sample set in advance, and the second sample image and the input image in the map image set have the same preset image type.
  • the guide map matrix is used to guide the prediction of each pixel of the input image by the full convolution network, and obtain the predicted score F s belonging to the foreground region and the predicted score B s belonging to the background region of each pixel in the input image. And the predicted score U s belonging to the unknown area.
  • the full convolutional network is a neural network having three classifications for each pixel in the input image, classifying each pixel into any one of a foreground region, a background region, and an unknown region.
  • the full convolution network is pre-trained according to actual values of each of the plurality of sample images having a predetermined image type belonging to the foreground region, belonging to the background region, and belonging to the unknown region.
  • Step 402 input F s , B s and U s of each pixel in the input image into the following formula to obtain F:
  • F is the foreground probability matrix of the input image and exp is an exponential function with the natural constant e as the base. For example, if a pixel has F s , B s , and U s of 80, 10, and 10, respectively, the foreground probability of the pixel is 0.985.
  • the foreground probability matrix F is a matrix of foreground probabilities for each pixel in the input image. For example, if the input image includes M rows and N columns of pixels, the foreground probability matrix F includes M rows and N columns of matrix elements, and each matrix element is a probability that one pixel belongs to the foreground region.
  • Step 403 input F s , B s and U s of each pixel in the input image into the following formula to obtain B:
  • the background probability matrix B is a matrix of foreground probabilities for each pixel in the input image. For example, if the input image includes M rows and N columns of pixels, the background probability matrix B includes M rows and N columns of matrix elements, and each matrix element is a probability that one pixel belongs to the background region.
  • step 402 and step 403 are parallel steps, step 402 and step 403 can be performed simultaneously, or step 402 is performed before step 403, or step 403 is performed before step 402.
  • Step 404 Input the foreground probability matrix F and the background probability matrix B into a preset map implementation function to obtain a transparency value matrix of the input image.
  • the map realization function is a map function obtained by using the sample image set to train the optimal solution of the target equation of the map using a preset backward propagation algorithm.
  • the first sample image and the input image in the sample image set have the same preset image type, for example, the first sample image and the input image are both half-length portrait images.
  • the preset mapping implementation function is represented by the following formula:
  • is a parameter obtained by training with the first sample image
  • L is a ⁇ Map Laplacian matrix
  • the ⁇ Map Laplacian matrix is used to indicate a linear relationship of the transparency value ⁇ between adjacent pixels of the input image.
  • the input image is calculated by a least squares method to obtain a reduced Laplacian matrix.
  • ⁇ and L can be regarded as known parameters.
  • the transparency value matrix is a matrix for mapping the input image.
  • the luminance value of each color component is multiplied by the transparency value at the corresponding position, ie The result of the map of the input image is available.
  • the input image 41 and the guide map matrix 42 are simultaneously input into the full convolution network 43, and the predicted score F s belonging to the foreground region of each pixel in the input image is obtained, belonging to the background.
  • the predicted score B s of the region and the predicted score U s belonging to the unknown region are respectively input to F s , B s and U s of each pixel in the input image, and are respectively input into steps 402 and 403 in the embodiment shown in FIG. 4A.
  • the two formulas, the foreground probability matrix F and the background probability matrix B are input to the map implementation function 44, thereby obtaining a transparency value matrix 45 of the input image.
  • the mapping method predicts an input image by using a full convolution network, and obtains a predicted score belonging to a foreground region and a predicted score belonging to a background region of each pixel in the input image. And belong to the unknown area
  • the predicted score of the domain does not require the user to manually perform a three-valued icon on the input image; at the same time, the predicted scores belonging to the foreground region of each pixel, the predicted scores belonging to the background region, and the predicted scores belonging to the unknown region are obtained.
  • the foreground probability matrix F and the background probability matrix B, the input map implementation function obtains a transparency value matrix, and the transparency value matrix is a matrix for mapping the input image.
  • the map implementation function Since the map implementation function is obtained by training the first sample image with a preset error backward propagation function, the map implementation function does not have a serious dependence on the accuracy of the three-value icon, so the map is not heavily dependent.
  • the realization function can realize accurate mapping, and does not require the user to repeatedly perform the three-value icon setting on the input image, thereby realizing the fully automatic realization of the input image to the transparency value matrix.
  • the mapping method provided by this embodiment also performs prediction by using a guiding graph matrix to assist the full convolution network, and the guiding graph matrix is trained by using a preset sample set, wherein the second sample image in the mapping image set is obtained. It has the same preset image type as the input image, which improves the prediction accuracy of the full convolution network when predicting the ternary maps (F s , B s and U s ) of the input image.
  • the embodiment of the present invention will explain the training process of the map implementation function and the training process of the guide graph matrix in combination with the method embodiments shown in FIG. 5 and FIG. 6.
  • FIG. 5 illustrates a method flowchart of a training process of a map implementation function provided by an embodiment of the present invention. This embodiment is exemplified by applying the training method to a terminal device having image processing capability.
  • the training method includes the following steps:
  • Step 501 Obtain a foreground probability matrix F, a background probability matrix B, and a sample transparency value matrix of the first sample image.
  • the foreground probability matrix F and the background probability matrix B of the first sample image are the first sample image and the guide map matrix are input into the full convolution network, and then outputted in the first sample image according to the full convolution network.
  • the predicted score F s belonging to the foreground region of each pixel, the predicted score B s belonging to the background region, and the predicted score U s belonging to the unknown region are two according to steps 402 and 403 in the embodiment shown in FIG. 4A The formula is calculated.
  • the sample transparency value matrix of the first sample image is a more accurate transparency value matrix obtained by digital mapping after the prior art.
  • the alpha value corresponding to each pixel in the sample transparency value matrix of the first sample image is known.
  • the mapping manner of the sample transparency value matrix is not limited.
  • the first sample image is manually labeled by the user, and then the labeled first sample image is processed by using a mapping algorithm. owned.
  • the mapping algorithm can be a closed-form matting algorithm.
  • step 502 the optimal solution of the target equation of the map is taken as the initial map implementation function.
  • the target equation of the map is the following energy equation:
  • the initial map implementation function has an initialized parameter ⁇ .
  • a random number between 0 and 1 is used.
  • the parameter ⁇ is initialized.
  • the random number can be obtained by a Gaussian random algorithm.
  • Step 503 Input the foreground probability matrix F and the background probability matrix B of the first sample image into the map implementation function to obtain a training transparency value matrix of the first sample image.
  • map implementation function is used as a forward propagation function in the training process.
  • the parameter ⁇ in the map realization function is the initialized parameter.
  • the parameter ⁇ in the map realization function is the parameter ⁇ which is updated by the backward propagation algorithm by the backward propagation algorithm, i is a positive integer greater than 1.
  • Step 504 Correct the parameters in the map implementation function by using an error backward propagation algorithm according to an error between the training transparency value matrix and the sample transparency value matrix.
  • the sample transparency value matrix characterizes the exact alpha value of the first sample image
  • the training transparency value matrix is the inaccurate alpha value predicted by the map implementation function.
  • the terminal device obtains the error of the map implementation function by comparing the training transparency value matrix with the sample transparency value matrix.
  • the error is that the alpha value of each pixel in the sample map result is compared with the alpha value of the pixel corresponding to the training map result, thereby obtaining an alpha value error for each pixel.
  • Back-Propagation Network is a supervised learning algorithm that iterates through iteratively in both the excitation propagation and the weight update until the response of the mapping function to the input image reaches the predetermined target range. .
  • step 504 includes the following two sub-steps:
  • the gradient in the gradient descent method is constructed by the following partial derivative
  • f is the map realization function
  • F is the foreground probability matrix
  • B is the background probability matrix
  • is the parameter for training using the sample image
  • D ⁇ B+ ⁇ F+L
  • L is the known ⁇ The matrix
  • diag is a function used to construct a diagonal matrix.
  • the preset threshold can be set according to the actual situation. The smaller the preset threshold is, the higher the accuracy requirement for the map is.
  • the parameter ⁇ in the map implementation function is updated according to the predetermined step size according to the gradient, so that the training transparency value matrix outputted by the map implementation function after the parameter is updated gradually approaches the sample transparency value matrix.
  • step 503 to step 504 are performed cyclically until the error is less than a preset threshold.
  • Step 505 repeating the above-mentioned correction step, when the error between the training transparency value matrix and the sample transparency value matrix is less than a preset threshold, the trained map implementation function is obtained.
  • the representation implementation function needs to continue training; when the error between the training transparency value matrix and the sample transparency value matrix is less than the preset threshold , on behalf of the map implementation function has been able to meet the accuracy requirements, stop the training process, get a trained map implementation function.
  • the above training process is performed by a plurality of first sample images.
  • the trained map implementation function is tested with another portion of the first sample image to test whether the map implementation function can reach a preset accuracy requirement.
  • the method for mapping provides training for the map implementation function by using an error backward propagation algorithm, and obtains a map implementation function with an error smaller than a preset threshold as a trained map implementation function.
  • improve the accuracy of the digital map, and the map implementation function does not rely heavily on the accuracy of the three-value icon for the input image. It only needs to use the three-value icon predicted by the full convolution network to get very accurate results. The result of the map.
  • FIG. 6 is a flowchart of a method for training a bootstrapping matrix provided by an embodiment of the present invention. This embodiment is exemplified by applying the training method to a terminal device having image processing capability.
  • the training method includes the following steps:
  • Step 601 Acquire ⁇ P i , M i ⁇ corresponding to n second sample images.
  • P i is a feature point set of the foreground target object in the i-th second sample image
  • M i is a sample transparency value matrix of the i-th second sample image
  • the second sample image is an image containing a foreground target object.
  • the foreground target object is the object in the map result that is expected to be marked as the foreground area.
  • the foreground target object is a portrait.
  • the second sample images are all digital images of the half-length portrait type; when the guide map matrix is for the full-length portrait type, the second sample images are all digital images of the full-length portrait type.
  • Step 602 Calculate the homography transformation matrix T i according to P i in the i-th second sample image.
  • the homography transformation matrix describes a one-to-one mapping between two images.
  • the homography transformation matrix is used to indicate a one-to-one mapping between the second sample image and the guide map matrix.
  • Step 603 calculating a guide map matrix according to the following formula:
  • M is the guide map matrix
  • is the product of the sample transparency value matrix of all second sample images and the homography transformation matrix
  • n is the number of second sample images in the sample set of the map
  • i can be Any of all integers between 1 and n.
  • the mapping method obtains a guiding matrix by using a mapping sample matrix, where the guiding matrix is used to indicate that each pixel in the image belonging to the preset image type belongs to the foreground region, the background region, and the unknown.
  • FIG. 7 is a block diagram of a mapping device provided by an embodiment of the present invention.
  • the device has a mapping function implemented in the above example, and the function may be implemented by hardware or may execute corresponding software through hardware.
  • the apparatus may include a prediction unit 701, a calculation unit 702, and a mapping unit 703.
  • the prediction unit 701 has a function of performing the above steps 201 and 401.
  • the calculating unit 702 has a function of performing the above steps 202, 402, and 403.
  • the mapping unit 703 has a function of performing the above steps 202 and 404.
  • the apparatus may further include a first training unit (not shown in Figure 7) and a second training unit (not shown in Figure 7).
  • the first training unit has the function of performing the above steps 501 to 505
  • the second training unit has the function of performing the above steps 601 to 603.
  • prediction unit 701, calculation unit 702, and mapping unit 703 may be implemented by a processor in the terminal executing one or more programs stored in the memory.
  • An exemplary embodiment of the present invention also provides a terminal comprising a mapping device provided by the embodiment shown in FIG. 7 or based on an alternative embodiment provided by the embodiment shown in FIG.
  • FIG. 8 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
  • the terminal may be a server for implementing the functions of the above method examples.
  • the terminal 800 can include a processor 801.
  • the processor 801 is configured to implement various functions of the terminal 800.
  • the processor 801 is further configured to perform the steps in the foregoing method embodiments, or other steps of the technical solution described in the present invention.
  • terminal 800 also includes a communication interface 802.
  • Communication interface 802 is used to support communication between terminal device 800 and other devices.
  • the terminal 800 may further include a memory 803 for storing program codes and data of the terminal 800.
  • terminal 800 can also include a bus 804.
  • the memory 803 and the communication interface 802 are connected to the processor 801 via a bus 804.
  • Figure 8 only shows a simplified design of the terminal 800.
  • the terminal 800 can include any number of communication interfaces, processors, memories, etc., and all the terminals that can implement the embodiments of the present invention are within the protection scope of the embodiments of the present invention.
  • the terminal includes corresponding hardware structures and/or software modules for performing various functions.
  • the embodiments of the present invention can be implemented in a combination of hardware or hardware and computer software in combination with the modules and algorithm steps of the examples described in the embodiments disclosed in the present invention. Whether a function is executed by hardware or computer software to drive hardware, depending on the technology Specific application and design constraints for the protocol. A person skilled in the art can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the technical solutions of the embodiments of the present invention.
  • the steps of the method or algorithm described in connection with the disclosure of the embodiments of the present invention may be implemented in a hardware manner, or may be implemented by a processor executing software instructions.
  • the software instructions may be composed of corresponding software modules, which may be stored in a random access memory (RAM), a flash memory, a read only memory (ROM), an erasable programmable read only memory ( Erasable Programmable ROM (EPROM), electrically erasable programmable read only memory (EEPROM), registers, hard disk, removable hard disk, compact disk read only (CD-ROM) or any other form of storage medium known in the art.
  • An exemplary storage medium is coupled to the processor to enable the processor to read information from, and write information to, the storage medium.
  • the storage medium can also be an integral part of the processor.
  • the processor and the storage medium can be located in an ASIC.
  • the processor and the storage medium can also exist as discrete components in the terminal device.
  • the functions described in the embodiments of the present invention may be implemented in hardware, software, firmware, or any combination thereof.
  • the functions may be stored in a computer readable medium or transmitted as one or more instructions or code on a computer readable medium.
  • Computer readable media includes both computer storage media and communication media including any medium that facilitates transfer of a computer program from one location to another.
  • a storage medium may be any available media that can be accessed by a general purpose or special purpose computer.
  • first, second, third, etc. are objects for distinguishing types, and are not necessarily used to describe a specific order or order, which should be understood.
  • the objects used may be interchanged where appropriate, so that embodiments of the invention can be implemented in other sequences in other embodiments than those illustrated or described herein.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

本发明公开了一种抠图方法及装置,属于数字抠图领域。所述方法包括:将输入图像输入预设的全卷积网络,得到输入图像中的每个像素的属于前景区域的预测分值Fs、属于背景区域的预测分值Bs和属于未知区域的预测分值Us;根据输入图像中的每个像素的Fs、Bs和Us计算输入图像对应的前景概率矩阵F和背景概率矩阵B;将前景概率矩阵F和背景概率矩阵B输入预设的抠图实现函数进行抠图,得到输入图像的透明度值矩阵。本发明由于抠图实现函数是利用第一样本图像采用预设的后向传播算法进行训练后得到的,所以本方法不需要多次对输入图像进行三值图标定即可获得较为精确的抠图结果,并且实现全自动的数字抠图。

Description

抠图方法及装置
本申请要求于2016年12月13日提交中国专利局、申请号为201611144676.3、发明名称为“抠图方法及装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及数字抠图领域,特别涉及一种抠图方法及装置。
背景技术
数字抠图技术是将数字图像I分解为前景图像F和背景图像B的技术。数字抠图技术的分解过程可表达为:
I=αF+(1-α)B;
其中,α是一个位于0到1之间的数,称为数字图像的透明度值或α掩像(alpha matte),数字图像I的α矩阵用于表示数字图像I的抠图结果,当α值为1时代表像素属于前景,当α值为0时代表像素属于背景,当α值为0和1之间的数则代表像素属于前背景混合区域。由于对数字图像中的每个像素都要同时估计α、F以及B,其中,α是单通道数据,而每个像素的F和B都是RGB(Red Green Blue,红绿蓝)三通道数据,所以对于数字图像中的每个像素需要同时估计七个未知数,导致上述数字抠图技术是非常难以精确求解的技术问题。
现有技术中,由用户手工标定来标定数字图像中的大部分像素的α值,也称三值图(英文:trimap)。如图1所示,对于一张输入图像100来讲,标定后的图像中包括:用户标定α值为1的前景区域12、用户标定α值为0的背景区域14,以及用户标定α值为不确定值的未知区域16,未知区域16是抠图算法需要估计的区域。在用户手工对输入图像进行标定后,采用闭合性抠图(closed-form matting)算法根据用户指定的前景区域12和背景区域14,对未知区域16中的前景像素和背景像素做出估计,得到未知区域中每个像素的α值。
由于用户很难精确指定闭合性抠图算法所需要的三值图,如果要得到精确的抠图结果,则需要用户不断地根据本次抠图结果重新标定下一次抠图所需要的三值图,该过程非常耗时且严重依赖用户的专业性。
发明内容
由于现有技术中,用户很难精确标定闭合性抠图算法所需要的三值图,如果要得到精确的抠图结果,则需要用户不断地根据本次抠图结果重新标定下一次抠图所需要的三值图,进行多次数字抠图后才能得到精确的抠图结果,该过程非常耗时且严重依赖用户的专业性。为此,本发明实施例提供了一种抠图方法及装置。在该抠图方法中,通过采用全卷积网络对输入图像进行预测,得到输入图像每个像素的属于前景区域的预测分值、属于背景区域的预测分值和属于未知区域的预测分值,而不需要用户手动对输入图像进行标定;同时将利用每个像素的属于前景区域的预测分值、属于背景区域的预测分值和属于未知区域的预测分值得到的前景概率矩阵F和背景概率矩阵B,输入抠图实现函数得到抠图结果。由于该抠图实现函 数是预先利用第一样本图像采用预设的后向传播算法进行训练后得到的函数,对输入图像进行三值图标定的准确性不具有严重依赖,所以本方法不需要用户手动多次对输入图像进行标定即可获得较为精确的抠图结果,并且实现了全自动的数字抠图。
作为本申请的一种可能的实现方式,该抠图方法包括:
将输入图像输入预设的全卷积网络,得到输入图像中的每个像素的属于前景区域的预测分值Fs、属于背景区域的预测分值Bs和属于未知区域的预测分值Us;全卷积网络是用于对每个像素所属区域进行预测的神经网络;
根据输入图像中的每个像素的Fs、Bs和Us,计算输入图像对应的前景概率矩阵F和背景概率矩阵B;前景概率矩阵F用于表示输入图像中每个像素属于前景区域的概率,背景概率矩阵B用于表示输入图像中每个像素属于背景区域的概率;
将前景概率矩阵F和背景概率矩阵B输入预设的抠图实现函数进行抠图,得到输入图像的透明度值矩阵,抠图实现函数是利用第一样本图像对抠图目标方程的最优解采用预设的误差后向传播算法进行训练后得到,第一样本图像和输入图像具有相同的预设图像类型,透明度值矩阵是用于对输入图像进行抠图的矩阵。
本申请通过采用预设的全卷积网络对输入图像进行预测,得到输入图像中的每个像素的属于前景区域的预测分值、属于背景区域的预测分值和属于未知区域的预测分值,不需要用户手动对输入图像进行三值图标定;同时将利用每个像素的属于前景区域的预测分值、属于背景区域的预测分值和属于未知区域的预测分值得到的前景概率矩阵F和背景概率矩阵B,输入抠图实现函数得到透明度值矩阵,由于该抠图实现函数是利用样本图像集采用预设的后向传播函数进行训练后得到的,样本图像集包括了大量与输入图像具有相同的预设图像类型的第一样本图像,从而使得该抠图实现函数能够根据前景概率矩阵F和背景概率矩阵B实现精确的抠图结果,不需要用户反复对输入图像进行三值图标定,实现输入图像到透明度值矩阵的全自动实现过程。
结合第一方面,在第一方面的第一种可能的实施方式中,抠图实现函数是利用第一样本图像对抠图目标方程的最优解采用预设的后向传播算法进行训练后得到,包括:获取第一样本图像的前景概率矩阵F和背景概率矩阵B和样本透明度值矩阵;将抠图目标方程的最优解作为初始的抠图实现函数;将第一样本图像的前景概率矩阵F和背景概率矩阵B输入抠图实现函数,得到第一样本图像的训练透明度值矩阵;根据训练透明度值矩阵和样本透明度值矩阵之间的误差,采用误差后向传播算法对抠图实现函数中的参数进行修正;重复上述修正步骤,当训练透明度值矩阵和样本透明度值矩阵之间的误差小于预设阈值时,得到训练后的抠图实现函数。
该可选实施方式通过采用误差后向传播算法对抠图实现函数进行训练,获取误差小于预设阈值的抠图实现函数作为训练好的抠图实现函数,能够提高数字抠图的准确性,而且该抠图实现函数不严重依赖对输入图像进行三值图标定的准确性。
结合第一方面的第一种可能的实施方式,在第一方面的第二种可能的实施方式中,当所述抠图实现函数为f(F,B;λ)=λ(λB+λF+L)-1F时;根据训练透明度值矩阵和样本透明度值矩阵之间的误差,采用误差后向传播算法对抠图实现函数中的参数进行修正,包括:当误差大于预设阈值且误差后向传播算法采用梯度下降法时,通过如下偏导数构建梯度下降法中的梯度;
Figure PCTCN2017100596-appb-000001
Figure PCTCN2017100596-appb-000002
Figure PCTCN2017100596-appb-000003
其中,f是抠图实现函数,F是前景概率矩阵,B是背景概率矩阵,λ是利用第一样本图像进行训练的参数,D=λB+λF+L,L是已知的抠图拉普拉斯矩阵,diag是用于构建对角矩阵的函数。
该可选实施方式通过采用梯度下降法,并根据相应的偏导数构建梯度下降法中的梯度,能够使训练透明度值矩阵能更快地逼近样本透明度值矩阵,提高对抠图实现函数进行训练时的训练效率。
结合第一方面、第一方面的第一种可能的实施方式、第二种可能的实施方式中的任一种可能的实现方式,在第三种可能的实施方式中,根据输入图像中的每个像素的Fs、Bs和Us,计算输入图像对应的前景概率矩阵F和背景概率矩阵B,包括:将输入图像中的每个像素的Fs、Bs和Us输入如下公式,得到F:
Figure PCTCN2017100596-appb-000004
将输入图像中的每个像素的Fs、Bs和Us输入第二公式,得到B:
Figure PCTCN2017100596-appb-000005
其中,exp是以自然常数e为底的指数函数。
该可选实施方式通过根据上述两个公式计算得到输入图像的前景概率矩阵和背景概率矩阵,作为后续抠图实现函数的输入,由于该过程是对输入图像的Fs、Bs和Us进行归一化处理的过程,可以减少后续抠图实现函数在进行抠图时的计算量,提高抠图效率。
结合第一方面、第一方面的第一种可能的实施方式、第二种可能的实施方式、第三种可能的实施方式、在第四种可能的实施方式中,将输入图像输入预设的全卷积网络,得到输入图像中的每个像素的属于前景区域的预测分值Fs、属于背景区域的预测分值Bs和属于未知区域的预测分值Us,包括:将输入图像和引导图矩阵输入全卷积网络,得到输入图像中的每个像素的Fs、Bs和Us,所导图矩阵用于指示属于预设图像类型的图像中每个像素属于前景区域、背景区域和未知区域的经验概率值,引导图矩阵是预先采用抠图样本集所训练得到的,抠图图像集中的第二样本图像和输入图像具有相同的预设图像类型。
该可选实现方式通过利用引导图矩阵辅助全卷积网络进行预测,由于引导图矩阵是预先采用抠图样本集所训练得到的,其中,抠图图像集中的第二样本图像和输入图像具有相同的预设图像类型,从而提高了全卷积网络对输入图像的三值图进行预测时的预测准确性。
结合第一方面的第四种可能的实施方式,在第五种可能的实施方式中,引导图矩阵是预先采用第二样本图像所训练得到的,包括:
获取n张第二样本图像对应的{Pi,Mi};其中,Pi是第i张第二样本图像中的前景目标 物体的特征点集,Mi是第i张第二样本图像的样本透明度值矩阵;根据第i张第二样本图像中的Pi计算出单应变换矩阵Ti;根据如下公式计算得到引导图矩阵M:
Figure PCTCN2017100596-appb-000006
M是引导图矩阵,∑是求和函数,n是第二样本图像的个数。
该可选实现方式通过根据抠图样本集获取引导图矩阵,引导图矩阵用于指示属于预设图像类型的图像中每个像素属于前景区域、背景区域和未知区域的经验概率值,由于抠图图像集中的第二样本图像和输入图像具有相同的预设图像类型,所以能够提高引导图矩阵的训练准确性。
第二方面,本发明实施例提供了一种抠图装置,该抠图装置包括至少一个单元,该至少一个单元用于实现上述第一方面或第一方面中任意一种可能的实现方式所提供的抠图方法。
第三方面,本发明实施例提供了一种终端,该终端包括:一个或多个处理器,和存储器,上述存储器存储有一个或多个程序,一个或多个程序被配置成由一个或多个处理器执行,一个或多个程序包含用于实现如第一方面所述抠图方法的指令。
第四方面,本发明实施例提供了一种计算机可读存储介质,该计算机可读存储介质中存储有用于实现上述第一方面或第一方面中任意一种可能的实施方式所提供的抠图方法的可执行程序。
附图说明
图1是现有技术一个实施例提供的标定后的输入图像的示意图;
图2是本发明一个实施例提供的抠图方法的流程图;
图3是图2所示实施例中涉及的全卷积网络的示意图;
图4A是本发明另一个实施例提供的抠图方法的流程图;
图4B是本发明另一个实施例提供的抠图方法的示意图;
图5是本发明一个实施例提供的抠图实现函数的训练过程的方法流程图;
图6是本发明一个实施例提供的训练引导图矩阵的流程图;
图7是本发明一个实施例提供的抠图装置的框图;
图8是本发明一个实施例提供的终端的框图。
具体实施方式
为使本发明实施例的目的、技术方案和优点更加清楚,下面将结合附图对本发明的实施方式作进一步地详细描述。
请参考图2,其示出了本发明一个实施例提供的抠图方法的流程图。本实施例以该抠图方法应用于具有图像处理能力的终端设备中来举例说明。该抠图方法包括如下几个步骤:
步骤201,将输入图像输入预设的全卷积网络,得到输入图像中的每个像素的属于前景区域的预测分值Fs、属于背景区域的预测分值Bs和属于未知区域的预测分值Us
输入图像是一帧数字图像。通常,输入图像是包括背景区域和前景区域的图像。比如, 一幅输入图像的图像类型为半身人像类型,则该输入图像的前景区域包括半身人像。
可选地,输入图像是采用红绿蓝(Red Green Blue,RGB)颜色标准的数字图像。输入图像中包括M*N个像素,每个像素采用RGB三个颜色分量来表示。需要说明的是,本发明实施例同样适用于黑白图像或其它颜色标准的图像,对此不加以限定。
全卷积网络(Fully Convolutional Networks,FCN)具有像素级分类能力的神经网络。在本实施例中,预设的全卷积网络是具有对输入图像中的每个像素进行三分类,即将每个像素分类至前景区域、背景区域和未知区域中的任意一种分类的神经网络。同时,该全卷积网络能够预测出输入图像中每个像素的属于前景区域的预测分值Fs、属于背景区域的预测分值Bs和属于未知区域的预测分值Us
可选地,全卷积网络通常包括卷积层与反卷积层。其中,全卷积网络的卷积层用于提取输入图像的特征图(Feature Map),全卷积网络的反卷积层用于对上述提取到的特征图进行上采样。全卷积网络具有模型体积小,计算速度快的优势。
如图3所示,其示例性示出了一种全卷积网络的示意图。该全卷积网络包括:一个输入层,至少一个卷积层(比如包括第一卷积层C1、第二卷积层C2和第三卷积层C3共3个卷积层),至少一个反卷积层(比如包括第一反卷积层D1、第二反卷积层D2和第三反卷积层D3共3个反卷积层)和一个输出层。输入层的输入数据是输入图像和引导图矩阵。输出层的输出结果是输入图像中每个像素的属于前景区域的预测分值Fs、属于背景区域的预测分值Bs和属于未知区域的预测分值Us。在本公开实施例中,对全卷积网络的卷积层和反卷积层的具体结构不作限定,上述图3所示的全卷积网络仅是示例性和解释性的,并不用于限定本发明实施例。一般来说,全卷积网络的层数越多,效果越好,但计算时间也会越长,在实际应用中,可结合对检测精度和效率的要求,设计适当层数的全卷积网络。
属于前景区域的预测分值Fs指示该像素属于前景区域的可能性,属于前景区域的预测分值Fs越大,该像素属于前景区域的可能性越大。
属于背景区域的预测分值Bs指示该像素属于背景区域的可能性,属于背景区域的预测分值Bs越大,该像素属于背景区域的可能性越大。
属于未知区域的预测分值Us指示该像素属于未知区域的可能性,属于未知区域的预测分值Us越大,该像素属于未知区域的可能性越大。
步骤202,根据输入图像中的每个像素的Fs、Bs和Us计算输入图像对应的前景概率矩阵F和背景概率矩阵B。
前景概率矩阵F用于表示输入图像中每个像素属于前景区域的概率,背景概率矩阵B用于表示输入图像中每个像素属于背景区域的概率。
步骤203,将前景概率矩阵F和背景概率矩阵B输入预设的抠图实现函数,得到输入图像的透明度值矩阵。
抠图实现函数是利用样本图像集对抠图目标方程的最优解采用预设的后向传播算法进行训练后得到的抠图函数。样本图像集中的第一样本图像和输入图像具有相同的预设图像类型,比如第一样本图像和输入图像都是半身人像图像。
可选地,预设的抠图实现函数采用如下的公式来表示:
f(F,B;λ)=λ(λB+λF+L)-1F。
其中,f(F,B;λ)是用于求解输入图像中每个像素的透明度值α的函数,λ是利用第一 样本图像进行训练后得到的参数,L是抠图拉普拉斯矩阵。
终端设备将前景概率矩阵F和背景概率矩阵B输入预设的抠图实现函数,得到输入图像的透明度值矩阵。
透明度值矩阵是用于对输入图像进行抠图的矩阵。可选地,对于采用红绿蓝(英文:Red、Green、Blue,RGB)颜色标准的输入图像的每一个像素,将其每个颜色分量的亮度值与对应位置处的透明度值相乘,即可得到输入图像的抠图结果。
综上所述,本实施例提供的抠图方法,通过采用全卷积网络对输入图像进行预测,得到输入图像中的每个像素的属于前景区域的预测分值、属于背景区域的预测分值和属于未知区域的预测分值,不需要用户手动对输入图像进行三值图标定;同时将利用每个像素的属于前景区域的预测分值、属于背景区域的预测分值和属于未知区域的预测分值得到的前景概率矩阵F和背景概率矩阵B,输入抠图实现函数得到透明度值矩阵,透明度值矩阵是用于对输入图像进行抠图的矩阵。由于该抠图实现函数是利用第一样本图像采用预设的误差后向传播函数进行训练后得到的,该抠图实现函数对三值图标定的准确性不具有严重依赖,所以该抠图实现函数能够实现精确地抠图,不需要用户反复对输入图像进行三值图标定,进而实现了输入图像到透明度值矩阵的全自动实现。
图4A示出了本发明另一个实施例提供的抠图方法的方法流程图。本实施例以该抠图方法应用于具有图像处理能力的终端设备中来举例说明。该抠图方法包括如下几个步骤:
步骤401,将输入图像和引导图矩阵输入全卷积网络,得到输入图像中的每个像素的Fs、Bs和Us
引导图矩阵用于指示属于预设图像类型的图像中每个像素属于前景区域、背景区域和未知区域的经验概率值。
预设图像类型是输入图像对应的图像类型,比如,预设图像类型是半身人像类型,则引导图矩阵是用于指示属于半身人像类型的图像中每个像素属于前景区域、背景区域和未知区域的经验概率值,该引导图矩阵能够表征大部分半身人像图像中人像所处的经验位置;又比如,预设图像类型是全身人像类型,则引导图矩阵是用于指示属于全身人像类型的图像中每个像素属于前景区域、背景区域和未知区域的经验概率值,该引导图矩阵能够表征大部分全身人像图像中人像所处的经验位置。
可选地,引导图矩阵是预先采用抠图样本集所训练得到的,抠图图像集中的第二样本图像和输入图像具有相同的预设图像类型。
其中,引导图矩阵用于引导全卷积网络对输入图像的每个像素的预测,得到输入图像中的每个像素的属于前景区域的预测分值Fs、属于背景区域的预测分值Bs和属于未知区域的预测分值Us
在本实施例中,全卷积网络是具有对输入图像中的每个像素进行三分类,将每个像素分类至前景区域、背景区域和未知区域中的任意一种分类的神经网络。可选地,该全卷积网络是根据多张具有预定图像类型的样本图像中每个像素属于前景区域、属于背景区域和属于未知区域的实际取值预先训练得到的。
将输入图像和引导图矩阵输入全卷积网络后,能够预测出该输入图像中每个像素的Fs、Bs和Us
步骤402,将输入图像中的每个像素的Fs、Bs和Us输入如下公式,得到F:
Figure PCTCN2017100596-appb-000007
其中,F是输入图像的前景概率矩阵,exp是以自然常数e为底的指数函数。比如,某个像素的Fs、Bs和Us分别为80、10和10,则该像素的前景概率为0.985。
前景概率矩阵F是输入图像中每个像素的前景概率所构成的矩阵。比如,输入图像包括M行N列像素,则前景概率矩阵F包括M行N列个矩阵元素,每个矩阵元素是一个像素属于前景区域的概率。
步骤403,将输入图像中的每个像素的Fs、Bs和Us输入如下公式,得到B:
Figure PCTCN2017100596-appb-000008
背景概率矩阵B是输入图像中每个像素的前景概率所构成的矩阵。比如,输入图像包括M行N列像素,则背景概率矩阵B包括M行N列个矩阵元素,每个矩阵元素是一个像素属于背景区域的概率。
需要说明的是,步骤402和步骤403是并列步骤,步骤402和步骤403可以同时执行,或,步骤402在步骤403之前执行,或,步骤403在步骤402之前执行。
步骤404,将前景概率矩阵F和背景概率矩阵B输入预设的抠图实现函数,得到输入图像的透明度值矩阵。
抠图实现函数是利用样本图像集对抠图目标方程的最优解采用预设的后向传播算法进行训练后得到的抠图函数。样本图像集中的第一样本图像和输入图像具有相同的预设图像类型,比如第一样本图像和输入图像都是半身人像图像。
可选地,预设的抠图实现函数采用如下的公式来表示:
f(F,B;λ)=λ(λB+λF+L)-1F。
其中,用于求解输入图像中每个像素的透明度值α的函数,λ是利用第一样本图像进行训练后得到的参数,L是抠图拉普拉斯矩阵。
其中,抠图拉普拉斯矩阵用于指示输入图像的相邻像素间的透明度值α的线性关系。可选地,对输入图像采用最小二乘法计算得到抠图拉普拉斯矩阵。在本步骤中,λ和L可视为已知参数。
透明度值矩阵是用于对输入图像进行抠图的矩阵。可选地,对于采用红绿蓝(英文:Red、Green、Blue,RGB)颜色标准的输入图像的每一个像素,将其每个颜色分量的亮度值与对应位置处的透明度值相乘,即可得到输入图像的抠图结果。
在一个具体的例子中,结合参考图4B,将输入图像41和引导图矩阵42同时输入全卷积网络43,得到输入图像中的每个像素的属于前景区域的预测分值Fs、属于背景区域的预测分值Bs和属于未知区域的预测分值Us,将输入图像中的每个像素的Fs、Bs和Us分别输入图4A所示实施例中步骤402和步骤403提供的两个公式,得到前景概率矩阵F和背景概率矩阵B输入抠图实现函数44,从而得到输入图像的透明度值矩阵45。
综上所述,本实施例提供的抠图方法,通过采用全卷积网络对输入图像进行预测,得到输入图像中的每个像素的属于前景区域的预测分值、属于背景区域的预测分值和属于未知区 域的预测分值,不需要用户手动对输入图像进行三值图标定;同时将利用每个像素的属于前景区域的预测分值、属于背景区域的预测分值和属于未知区域的预测分值得到的前景概率矩阵F和背景概率矩阵B,输入抠图实现函数得到透明度值矩阵,透明度值矩阵是用于对输入图像进行抠图的矩阵。由于该抠图实现函数是利用第一样本图像采用预设的误差后向传播函数进行训练后得到的,该抠图实现函数对三值图标定的准确性不具有严重依赖,所以该抠图实现函数能够实现精确地抠图,不需要用户反复对输入图像进行三值图标定,进而实现了输入图像到透明度值矩阵的全自动实现。
本实施例提供的抠图方法,还通过利用引导图矩阵辅助全卷积网络进行预测,由于引导图矩阵是预先采用抠图样本集所训练得到的,其中,抠图图像集中的第二样本图像和输入图像具有相同的预设图像类型,从而提高了全卷积网络对输入图像的三值图(Fs、Bs和Us)进行预测时的预测准确性。
在对输入图像进行全自动抠图之前,需要预先训练得到抠图实现函数和引导图矩阵。本发明实施例将结合图5和图6所示的方法实施例对抠图实现函数的训练过程以及引导图矩阵的训练过程进行阐述。
在图5实施例中,采用神经网络中的误差后向传播算法对抠图实现函数进行训练。请参考图5,其示出了本发明一个实施例提供的抠图实现函数的训练过程的方法流程图。本实施例以该训练方法应用于具有图像处理能力的终端设备中来举例说明。该训练方法包括如下几个步骤:
步骤501,获取第一样本图像的前景概率矩阵F、背景概率矩阵B和样本透明度值矩阵。
可选地,第一样本图像的前景概率矩阵F和背景概率矩阵B是将第一样本图像和引导图矩阵输入全卷积网络,然后根据全卷积网络输出的第一样本图像中的每个像素的属于前景区域的预测分值Fs、属于背景区域的预测分值Bs和属于未知区域的预测分值Us根据图4A所示实施例中步骤402和步骤403提供的两个公式计算得到的。
第一样本图像的样本透明度值矩阵是通过已有技术进行数字抠图后,得到的较为精确的透明度值矩阵。第一样本图像的样本透明度值矩阵中每个像素对应的α值均为已知的。
本实施例对样本透明度值矩阵的抠图方式不加以限定,示意性的,通过由用户手工对第一样本图像进行标注,然后采用抠图算法对标注后的的第一样本图像进行处理得到的。抠图算法可以是closed-form matting算法。
步骤502,将抠图目标方程的最优解作为初始的抠图实现函数。
可选地,抠图目标方程是如下的能量方程:
minλATBA+λ(A-1)TF(A-1)+ATLA,
其中,λ是参数,F是前景概率矩阵,B是背景概率矩阵,对上述能量方程中的A进行求解,使得上述能量方程得到最小值。也即,上述能量方程存在显式的解:
A=λ(λB+λF+L)-1F,
根据上述能量方程的解,得到抠图实现函数f(F,B;λ)。
也即,初始的抠图实现函数为
f(F,B;λ)=λ(λB+λF+L)-1F。
其中,初始的抠图实现函数具有初始化的参数λ。示意性的,采用0到1之间的随机数 对参数λ进行初始化。该随机数可以通过高斯随机算法得到。
步骤503,将第一样本图像的前景概率矩阵F和背景概率矩阵B输入抠图实现函数,得到第一样本图像的训练透明度值矩阵。
此时,抠图实现函数作为训练过程中的正向传播函数来使用。
第1次得到第一样本图像的训练透明度值矩阵时,抠图实现函数中的参数λ是初始化的参数。
第i次得到第一样本图像的训练透明度值矩阵时,抠图实现函数中的参数λ是根据误差通过后向传播算法进行第i-1次更新的参数λ,i为大于1的正整数。
步骤504,根据训练透明度值矩阵和样本透明度值矩阵之间的误差,采用误差后向传播算法对抠图实现函数中的参数进行修正。
对于第一样本图像来讲,样本透明度值矩阵表征了该第一样本图像的准确的α值,而训练透明度值矩阵是由抠图实现函数所预测的不准确的α值。终端设备通过将训练透明度值矩阵和样本透明度值矩阵对比,得到抠图实现函数的误差。
可选地,误差是样本抠图结果中每个像素的α值与训练抠图结果对应像素的α值进行比较,从而得到每个像素的α值误差。
误差后向传播算法是(Back—PropagationNetwork,BP)是一种监督学习算法,通过在激励传播与权重更新两个环节反复循环迭代,直到抠图实现函数对输入图像的响应达到预定的目标范围为止。
可选地,误差后向传播算法有多种,最常使用的是梯度下降法。
可选地,当抠图实现函数为f(F,B;λ)=λ(λB+λF+L)-1F时,步骤504包括如下两个子步骤:
第一步,当误差大于预设阈值且误差后向传播算法采用梯度下降法时,通过如下偏导数构建梯度下降法中的梯度;
Figure PCTCN2017100596-appb-000009
Figure PCTCN2017100596-appb-000010
Figure PCTCN2017100596-appb-000011
其中,f是抠图实现函数,F是前景概率矩阵,B是背景概率矩阵,λ是利用样本图像进行训练的参数,D=λB+λF+L,L是已知的抠图拉普拉斯矩阵,diag是用于构建对角矩阵的函数。预设阈值可根据实际情况设定,预设阈值越小,则说明对抠图精度要求越高。
第二步,根据梯度按照预定步长更新抠图实现函数中的参数λ,使得更新参数后的抠图实现函数输出的训练透明度值矩阵逐步逼近样本透明度值矩阵。
可选地,终端设备更新抠图实现函数中的参数λ后,循环执行步骤503至步骤504,直到误差小于预设阈值。
步骤505,重复上述修正步骤,当训练透明度值矩阵和样本透明度值矩阵之间的误差小于预设阈值时,得到训练后的抠图实现函数。
当训练透明度值矩阵和样本透明度值矩阵之间的误差不小于预设阈值时,则代表抠图实现函数需要继续训练;当训练透明度值矩阵和样本透明度值矩阵之间的误差小于预设阈值时,则代表抠图实现函数已经能够满足精度要求,停止训练过程,得到训练好的抠图实现函数。
可选地,通过多张第一样本图像执行上述训练过程。
可选地,对训练好的抠图实现函数采用另一部分第一样本图像进行测试,以测试该抠图实现函数是否能够达到预设的精度要求。
综上所述,本实施例提供的抠图方法,通过采用误差后向传播算法对抠图实现函数进行训练,获取误差小于预设阈值的抠图实现函数作为训练好的抠图实现函数,能够提高数字抠图的准确性,而且该抠图实现函数不严重依赖对输入图像进行三值图标定的准确性,仅需要使用全卷积网络预测的三值图标定结果,也能够得到非常精确的抠图结果。
请参考图6,其示出了本发明一个实施例提供的引导图矩阵的训练过程的方法流程图。本实施例以该训练方法应用于具有图像处理能力的终端设备中来举例说明。该训练方法包括如下几个步骤:
步骤601,获取n张第二样本图像对应的{Pi,Mi}。
其中,Pi是第i张第二样本图像中的前景目标物体的特征点集,Mi是第i张第二样本图像的样本透明度值矩阵。
第二样本图像是包含前景目标物体的图像。前景目标物体是抠图结果中期望被标注为前景区域的物体。比如,前景目标物体是人像。
当引导图矩阵对应于半身人像类型时,第二样本图像全部是半身人像类型的数字图像;当引导图矩阵对于全身人像类型时,第二样本图像全部是全身人像类型的数字图像。
步骤602,根据第i张第二样本图像中的Pi计算出单应变换矩阵Ti
单应变换矩阵描述了两个图像之间的一对一点映射。在本实施例中,单应变换矩阵用于指示第二样本图像与引导图矩阵之间的一对一点映射。
步骤603,根据如下公式计算得到引导图矩阵:
Figure PCTCN2017100596-appb-000012
其中,M是引导图矩阵,∑是对所有第二样本图像的样本透明度值矩阵与单应变换矩阵的乘积进行求和,n是抠图样本集中第二样本图像的个数,i可以是从1到n之间的所有整数中的任意一个。
综上所述,本实施例提供的抠图方法,通过根据抠图样本集获取引导图矩阵,引导图矩阵用于指示属于预设图像类型的图像中每个像素属于前景区域、背景区域和未知区域的经验概率值,由于抠图图像集中的第二样本图像和输入图像具有相同的预设图像类型,所以能够提高引导图矩阵的训练准确性。
下述为本发明装置实施例,可以用于执行本发明方法实施例。对于本发明装置实施例中 未披露的细节,请参照本发明方法实施例。
请参考图7,其示出了本发明一个实施例提供的抠图装置的框图,该装置具有实现上述示例中的抠图功能,该功能可以通过硬件实现,也可以通过硬件执行相应的软件的结合实现。该装置可以包括:预测单元701、计算单元702和抠图单元703。
预测单元701,具有执行上述步骤201和步骤401的功能。
计算单元702,具有执行上述步骤202、步骤402和步骤403的功能。
抠图单元703,具有执行上述步骤202和步骤404的功能。
可选地,该装置还可以包括第一训练单元(图7中未示出)和第二训练单元(图7中未示出)。其中,第一训练单元具有执行上述步骤501至步骤505的功能,第二训练单元具有执行上述步骤601至步骤603的功能。
需要说明的是,上述的预测单元701、计算单元702和抠图单元703可以通过终端中的处理器执行存储器中存储的一个或多个程序来实现。
本发明一示例性实施例还提供了一种终端,该终端包括如图7所示实施例或基于图7所示实施例提供的可选实施例所提供的抠图装置。
需要说明的是:上述实施例提供的装置在实现其功能时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将设备的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的装置与方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
请参考图8,其示出了本发明一个实施例提供的终端的结构示意图。例如,该终端可以是服务器,用于实现上述方法示例的功能。终端800可以包括:处理器801。
处理器801用于实现终端800的各项功能。所述处理器801还用于执行上述方法实施例中的各个步骤,或者本发明所描述的技术方案的其它步骤。
可选地,终端800还包括通信接口802。通信接口802用于支持终端设备800与其他设备之间的通信。
进一步地,终端800还可以包括存储器803,存储器803用于存储终端800的程序代码和数据。
此外,终端800还可以包括总线804。所述存储器803和所述通信接口802通过总线804与所述处理器801相连。
可以理解的是,图8仅仅示出了终端800的简化设计。在实际应用中,终端800可以包含任意数量的通信接口,处理器,存储器等,而所有可以实现本发明实施例的终端都在本发明实施例的保护范围之内。
上述主要从终端的角度对本发明实施例提供的方案进行了介绍。可以理解的是,终端为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。结合本发明中所公开的实施例描述的各示例的模块及算法步骤,本发明实施例能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技 术方案的特定应用和设计约束条件。本领域技术人员可以对每个特定的应用来使用不同的方法来实现所描述的功能,但是这种实现不应认为超出本发明实施例的技术方案的范围。
结合本发明实施例公开内容所描述的方法或者算法的步骤可以硬件的方式来实现,也可以是由处理器执行软件指令的方式来实现。软件指令可以由相应的软件模块组成,软件模块可以被存放于随机存取存储器(Random Access Memory,RAM)、闪存、只读存储器(Read Only Memory,ROM)、可擦除可编程只读存储器(Erasable Programmable ROM,EPROM)、电可擦可编程只读存储器(Electrically EPROM,EEPROM)、寄存器、硬盘、移动硬盘、只读光盘(CD-ROM)或者本领域熟知的任何其它形式的存储介质中。一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于ASIC中。当然,处理器和存储介质也可以作为分立组件存在于终端设备中。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本发明实施例所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。计算机可读介质包括计算机存储介质和通信介质,其中通信介质包括便于从一个地方向另一个地方传送计算机程序的任何介质。存储介质可以是通用或专用计算机能够存取的任何可用介质。
上述本发明实施例序号仅仅为了描述,不代表实施例的优劣。
在本发明实施例中,术语“第一”、“第二”、“第三”等(如果存在)是用于区别类型的对象,而不必用于描述特定的顺序或先后次序,应该理解这样使用的对象在适当情况下可以互换,以便本发明实施例能够在除了本文图示或描述的实施例之外的其它实施例中以其它顺序实施。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本发明的部分实施例,并不用以限制本发明,凡在本发明的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。

Claims (13)

  1. 一种抠图方法,其特征在于,所述方法包括:
    将输入图像输入预设的全卷积网络,得到所述输入图像中的每个像素的属于前景区域的预测分值Fs、属于背景区域的预测分值Bs和属于未知区域的预测分值Us;所述全卷积网络是用于对每个像素所属区域进行预测的神经网络;
    根据所述输入图像中的每个像素的所述Fs、所述Bs和所述Us,计算所述输入图像对应的前景概率矩阵F和背景概率矩阵B;所述前景概率矩阵F用于表示所述输入图像中每个像素属于所述前景区域的概率,所述背景概率矩阵B用于表示所述输入图像中每个像素属于所述背景区域的概率;
    将所述前景概率矩阵F和所述背景概率矩阵B输入预设的抠图实现函数,得到所述输入图像的透明度值矩阵,所述抠图实现函数是利用第一样本图像对抠图目标方程的最优解采用预设的误差后向传播算法进行训练后得到的,所述第一样本图像和所述输入图像具有相同的预设图像类型,所述透明度值矩阵是用于对所述输入图像进行抠图的矩阵。
  2. 根据权利要求1所述的抠图方法,其特征在于,所述抠图实现函数是利用第一样本图像对抠图目标方程的最优解采用预设的后向传播算法进行训练后得到的,包括:
    获取所述第一样本图像的前景概率矩阵F、背景概率矩阵B和样本透明度值矩阵;
    将所述抠图目标方程的最优解作为初始的抠图实现函数;
    将所述第一样本图像的所述前景概率矩阵F和所述背景概率矩阵B输入所述抠图实现函数,得到所述第一样本图像的训练透明度值矩阵;
    根据所述训练透明度值矩阵和所述样本透明度值矩阵之间的误差,采用误差后向传播算法对所述抠图实现函数中的参数进行修正;
    重复上述修正步骤,当所述训练透明度值矩阵和所述样本透明度值矩阵之间的误差小于预设阈值时,得到训练后的所述抠图实现函数。
  3. 根据权利要求2所述的抠图方法,其特征在于,当所述抠图实现函数为f(F,B;λ)=λ(λB+λF+L)-1F时,所述根据所述训练透明度值矩阵和所述样本透明度值矩阵之间的误差,采用误差后向传播算法对所述抠图实现函数中的参数进行修正,包括:
    当所述误差大于所述预设阈值且所述误差后向传播算法采用梯度下降法时,通过如下偏导数构建所述梯度下降法中的梯度;
    Figure PCTCN2017100596-appb-100001
    Figure PCTCN2017100596-appb-100002
    Figure PCTCN2017100596-appb-100003
    根据所述梯度按照预定步长更新所述抠图实现函数中的所述参数λ,使得更新参数后的所述抠图实现函数输出的所述训练透明度值矩阵逐步逼近所述样本透明度值矩阵;
    其中,f是所述抠图实现函数,F是所述前景概率矩阵,B是所述背景概率矩阵,λ是利用所述第一样本图像进行训练的参数,D=λB+λF+L,L是已知的抠图拉普拉斯矩阵,diag是用于构建对角矩阵的函数。
  4. 根据权利要求1至3任一所述的抠图方法,其特征在于,所述根据所述输入图像中的每个像素的所述Fs、所述Bs和所述Us,计算所述输入图像对应的前景概率矩阵F和背景概率矩阵B,包括:
    将所述输入图像中的每个像素的所述Fs、所述Bs和所述Us输入如下公式,得到所述F:
    Figure PCTCN2017100596-appb-100004
    将所述输入图像中的每个像素的所述Fs、所述Bs和所述Us输入如下公式,得到所述B:
    Figure PCTCN2017100596-appb-100005
    其中,exp是以自然常数e为底的指数函数。
  5. 根据权利要求1至4任一所述的抠图方法,其特征在于,所述将输入图像输入预设的全卷积网络,得到所述输入图像中的每个像素的属于前景区域的预测分值Fs、属于背景区域的预测分值Bs和属于未知区域的预测分值Us,包括:
    将所述输入图像和引导图矩阵输入所述全卷积网络,得到所述输入图像中的每个像素的所述Fs、所述Bs和所述Us,所述引导图矩阵用于指示属于所述预设图像类型的图像中每个像素属于所述前景区域、所述背景区域和所述未知区域的经验概率值,所述引导图矩阵是预先采用第二样本图像所训练得到的,所述第二样本图像和所述输入图像具有相同的预设图像类型。
  6. 根据权利要求5所述的方法,其特征在于,所述引导图矩阵是预先采用第二样本图像所训练得到的,包括:
    获取n张所述第二样本图像对应的{Pi,Mi};其中,Pi是第i张所述第二样本图像中的前景目标物体的特征点集,Mi是第i张所述第二样本图像的样本透明度值矩阵;
    根据第i张所述第二样本图像中的Pi计算出单应变换矩阵Ti
    根据如下公式计算得到所述引导图矩阵M:
    Figure PCTCN2017100596-appb-100006
    所述M是所述引导图矩阵,∑是求和函数,n是所述第二样本图像的个数。
  7. 一种抠图装置,其特征在于,所述装置包括:
    预测单元,用于将输入图像输入预设的全卷积网络,得到所述输入图像中的每个像素的属于前景区域的预测分值Fs、属于背景区域的预测分值Bs和属于未知区域的预测分值Us;所述全卷积网络是用于对每个像素所属区域进行预测的神经网络;
    计算单元,用于根据所述输入图像中的每个像素的所述Fs、所述Bs和所述Us,计算所述输入图像对应的前景概率矩阵F和背景概率矩阵B;所述前景概率矩阵F用于表示所述输入图像中每个像素属于前景区域的概率,所述背景概率矩阵B用于表示所述输入图像中每个像素属于背景区域的概率;
    抠图单元,用于将所述前景概率矩阵F和所述背景概率矩阵B输入预设的抠图实现函数进行抠图,得到所述输入图像的透明度值矩阵,所述抠图实现函数是利用第一样本图像对抠图目标方程的最优解采用预设的误差后向传播算法进行训练后得到的;所述第一样本图像和所述输入图像具有相同的预设图像类型,所述透明度值矩阵是用于对所述输入图像进行抠图的矩阵。
  8. 根据权利要求7所述的抠图装置,其特征在于,所述装置,还包括:
    第一训练单元,用于获取所述第一样本图像的所述前景概率矩阵F、所述第一样本图像的所述背景概率矩阵B和的样本透明度值矩阵;将所述抠图目标方程的最优解作为初始的抠图实现函数;将所述第一样本图像的所述前景概率矩阵F和所述第一样本图像的所述背景概率矩阵B输入所述抠图实现函数进行抠图,得到所述第一样本图像的训练透明度值矩阵;根据所述训练透明度值矩阵和所述样本透明度值矩阵之间的误差,采用误差后向传播算法对所述抠图实现函数中的参数进行修正;重复上述修正步骤,当所述训练透明度值矩阵和所述样本透明度值矩阵之间的误差小于所述预设阈值时,得到训练后的所述抠图实现函数。
  9. 根据权利要求8所述的抠图装置,其特征在于,当所述抠图实现函数为f(F,B;λ)=λ(λB+λF+L)-1F时,所述第一训练单元,用于当所述误差大于所述预设阈值且所述误差后向传播算法采用梯度下降法时,通过如下偏导数构建所述梯度下降法中的梯度;
    Figure PCTCN2017100596-appb-100007
    Figure PCTCN2017100596-appb-100008
    Figure PCTCN2017100596-appb-100009
    根据所述梯度按照预定步长更新所述抠图实现函数中的所述参数λ,使得更新参数后的所述抠图实现函数输出的所述训练透明度值矩阵逐步逼近所述样本透明度值矩阵;
    其中,f是所述抠图实现函数,F是所述前景概率矩阵,B是所述背景概率矩阵,λ是利用所述第一样本图像进行训练的参数,D=λB+λF+L,L是已知的抠图拉普拉斯矩阵,diag是用于构建对角矩阵的函数。
  10. 根据权利要求7至9任一所述的抠图装置,其特征在于,所述计算单元,用于将所 述输入图像中的每个像素的所述Fs、所述Bs和所述Us输入如下公式,得到所述F:
    Figure PCTCN2017100596-appb-100010
    将所述输入图像中的每个像素的所述Fs、所述Bs和所述Us输入如下公式,得到所述B:
    Figure PCTCN2017100596-appb-100011
    其中,exp是以自然常数e为底的指数函数。
  11. 根据权利要求7至10任一所述的装置,其特征在于,所述预测单元,用于将所述输入图像和引导图矩阵输入所述全卷积网络,得到所述输入图像中的每个像素的所述Fs、所述Bs和所述Us,所述引导图矩阵用于指示属于所述预设图像类型的图像中每个像素属于所述前景区域、所述背景区域和所述未知区域的经验概率值,所述引导图矩阵是预先采用第二样本图像所训练得到的,所述第二样本图像和所述输入图像具有相同的预设图像类型。
  12. 根据权利要求11所述的抠图装置,其特征在于,所述装置还包括:
    第二训练单元,用于获取n张所述第二样本图像对应的{Pi,Mi};其中,Pi是第i张所述第二样本图像中的前景目标物体的特征点集,Mi是第i张所述第二样本图像的样本透明度值矩阵;根据第i张所述第二样本图像中的Pi计算出单应变换矩阵Ti;根据如下公式计算得到所述引导图矩阵:
    所述M是所述引导图矩阵,∑是求和函数,n是所述第二样本图像的个数。
  13. 一种终端,其特征在于,所述终端包括:
    一个或多个处理器;和
    存储器;
    所述存储器存储有一个或多个程序,所述一个或多个程序被配置成由所述一个或多个处理器执行,所述一个或多个程序包含用于实现如权利要求1至6任一所述抠图方法的指令。
PCT/CN2017/100596 2016-12-13 2017-09-05 抠图方法及装置 WO2018107825A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611144676.3A CN108460770B (zh) 2016-12-13 2016-12-13 抠图方法及装置
CN201611144676.3 2016-12-13

Publications (1)

Publication Number Publication Date
WO2018107825A1 true WO2018107825A1 (zh) 2018-06-21

Family

ID=62559637

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/100596 WO2018107825A1 (zh) 2016-12-13 2017-09-05 抠图方法及装置

Country Status (2)

Country Link
CN (1) CN108460770B (zh)
WO (1) WO2018107825A1 (zh)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108961303A (zh) * 2018-07-23 2018-12-07 北京旷视科技有限公司 一种图像处理方法、装置、电子设备和计算机可读介质
CN108986132A (zh) * 2018-07-04 2018-12-11 华南理工大学 一种使用全卷积神经网络生成证件照Trimap图的方法
CN111784564A (zh) * 2020-06-30 2020-10-16 稿定(厦门)科技有限公司 自动抠图方法及***
CN112581480A (zh) * 2020-12-22 2021-03-30 深圳市雄帝科技股份有限公司 自动抠图方法、***及其可读存储介质
CN112801896A (zh) * 2021-01-19 2021-05-14 西安理工大学 基于前景提取的逆光图像增强方法
CN112884776A (zh) * 2021-01-22 2021-06-01 浙江大学 一种基于合成数据集增广的深度学习抠图方法
CN113191956A (zh) * 2021-01-19 2021-07-30 西安理工大学 基于深度抠图的逆光图像增强方法
CN113487630A (zh) * 2021-07-14 2021-10-08 辽宁向日葵教育科技有限公司 一种基于材质分析技术的抠像方法

Families Citing this family (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109493363B (zh) * 2018-09-11 2019-09-27 北京达佳互联信息技术有限公司 一种基于测地距离的抠图处理方法、装置和图像处理设备
CN110969641A (zh) * 2018-09-30 2020-04-07 北京京东尚科信息技术有限公司 图像处理方法和装置
CN109461167B (zh) * 2018-11-02 2020-07-21 Oppo广东移动通信有限公司 图像处理模型的训练方法、抠图方法、装置、介质及终端
CN109920018A (zh) * 2019-01-23 2019-06-21 平安科技(深圳)有限公司 基于神经网络的黑白照片色彩恢复方法、装置及存储介质
CN109829925B (zh) * 2019-01-23 2020-12-25 清华大学深圳研究生院 一种在抠图任务中提取干净前景的方法及模型训练方法
CN110070507B (zh) * 2019-04-17 2021-03-02 安徽科朗电子科技有限公司 一种视频图像的抠图方法、装置、存储介质及抠图设备
CN110322468A (zh) * 2019-06-04 2019-10-11 广东工业大学 一种图像自动编辑方法
CN111223106B (zh) * 2019-10-28 2022-08-09 稿定(厦门)科技有限公司 全自动人像蒙版抠图方法及***
CN111091535A (zh) * 2019-11-22 2020-05-01 三一重工股份有限公司 基于深度学习图像语义分割的工厂管理方法和***
CN113052755A (zh) * 2019-12-27 2021-06-29 杭州深绘智能科技有限公司 一种基于深度学习的高分辨率图像智能化抠图方法
CN111833355A (zh) * 2020-06-05 2020-10-27 杭州艺旗网络科技有限公司 一种抠取图片的方法
CN114792325A (zh) * 2021-01-25 2022-07-26 清华大学 一种人像抠图方法及***
CN113628221B (zh) * 2021-08-03 2024-06-21 Oppo广东移动通信有限公司 图像处理方法、图像分割模型训练方法及相关装置
CN115708126A (zh) * 2021-08-18 2023-02-21 北京字跳网络技术有限公司 一种图像处理方法、装置、设备及存储介质
CN113838084A (zh) * 2021-09-26 2021-12-24 上海大学 基于编解码器网络和引导图的抠图方法
CN113657402B (zh) * 2021-10-18 2022-02-01 北京市商汤科技开发有限公司 抠像处理方法、装置、电子设备及存储介质
CN115496776A (zh) * 2022-09-13 2022-12-20 北京百度网讯科技有限公司 抠图方法、抠图模型的训练方法及装置、设备、介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050105787A1 (en) * 2002-05-03 2005-05-19 Vialogy Corp., A Delaware Corporation Technique for extracting arrayed data
CN101261739A (zh) * 2008-03-25 2008-09-10 武汉大学 一种具有纠偏性的自然图像抠图的全局优化方法
CN103400386A (zh) * 2013-07-30 2013-11-20 清华大学深圳研究生院 一种用于视频中的交互式图像处理方法
CN103942794A (zh) * 2014-04-16 2014-07-23 南京大学 一种基于置信度的图像协同抠图方法
CN105590307A (zh) * 2014-10-22 2016-05-18 华为技术有限公司 基于透明度的抠图方法和装置

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7190809B2 (en) * 2002-06-28 2007-03-13 Koninklijke Philips Electronics N.V. Enhanced background model employing object classification for improved background-foreground segmentation
CN104063865B (zh) * 2014-06-27 2017-08-01 小米科技有限责任公司 分类模型创建方法、图像分割方法及相关装置
CN104966274B (zh) * 2015-06-12 2019-01-29 杭州电子科技大学 一种采用图像检测与区域提取的局部模糊复原方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050105787A1 (en) * 2002-05-03 2005-05-19 Vialogy Corp., A Delaware Corporation Technique for extracting arrayed data
CN101261739A (zh) * 2008-03-25 2008-09-10 武汉大学 一种具有纠偏性的自然图像抠图的全局优化方法
CN103400386A (zh) * 2013-07-30 2013-11-20 清华大学深圳研究生院 一种用于视频中的交互式图像处理方法
CN103942794A (zh) * 2014-04-16 2014-07-23 南京大学 一种基于置信度的图像协同抠图方法
CN105590307A (zh) * 2014-10-22 2016-05-18 华为技术有限公司 基于透明度的抠图方法和装置

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108986132A (zh) * 2018-07-04 2018-12-11 华南理工大学 一种使用全卷积神经网络生成证件照Trimap图的方法
CN108986132B (zh) * 2018-07-04 2020-10-27 华南理工大学 一种使用全卷积神经网络生成证件照Trimap图的方法
CN108961303A (zh) * 2018-07-23 2018-12-07 北京旷视科技有限公司 一种图像处理方法、装置、电子设备和计算机可读介质
CN108961303B (zh) * 2018-07-23 2021-05-07 北京旷视科技有限公司 一种图像处理方法、装置、电子设备和计算机可读介质
CN111784564A (zh) * 2020-06-30 2020-10-16 稿定(厦门)科技有限公司 自动抠图方法及***
CN112581480A (zh) * 2020-12-22 2021-03-30 深圳市雄帝科技股份有限公司 自动抠图方法、***及其可读存储介质
CN112801896A (zh) * 2021-01-19 2021-05-14 西安理工大学 基于前景提取的逆光图像增强方法
CN113191956A (zh) * 2021-01-19 2021-07-30 西安理工大学 基于深度抠图的逆光图像增强方法
CN112801896B (zh) * 2021-01-19 2024-02-09 西安理工大学 基于前景提取的逆光图像增强方法
CN113191956B (zh) * 2021-01-19 2024-02-09 西安理工大学 基于深度抠图的逆光图像增强方法
CN112884776A (zh) * 2021-01-22 2021-06-01 浙江大学 一种基于合成数据集增广的深度学习抠图方法
CN112884776B (zh) * 2021-01-22 2022-05-31 浙江大学 一种基于合成数据集增广的深度学习抠图方法
CN113487630A (zh) * 2021-07-14 2021-10-08 辽宁向日葵教育科技有限公司 一种基于材质分析技术的抠像方法

Also Published As

Publication number Publication date
CN108460770A (zh) 2018-08-28
CN108460770B (zh) 2020-03-10

Similar Documents

Publication Publication Date Title
WO2018107825A1 (zh) 抠图方法及装置
US11170482B2 (en) Image processing method and device
WO2021164228A1 (zh) 一种图像数据的增广策略选取方法及***
US10740881B2 (en) Deep patch feature prediction for image inpainting
CN108229488B (zh) 用于检测物体关键点的方法、装置及电子设备
US9344690B2 (en) Image demosaicing
JP4750443B2 (ja) 単一のイメージからのラジオメトリック較正
CN107679466B (zh) 信息输出方法和装置
CN111091567B (zh) 医学图像配准方法、医疗设备及存储介质
WO2020253508A1 (zh) 异常细胞检测方法、装置及计算机可读存储介质
KR20180065889A (ko) 타겟의 검측 방법 및 장치
WO2020134533A1 (zh) 深度模型训练方法及装置、电子设备及存储介质
EP3671544A1 (en) Image processing method and information processing device
CN112241976A (zh) 一种训练模型的方法及装置
US20210358081A1 (en) Information processing apparatus, control method thereof, imaging device, and storage medium
CN110930296A (zh) 图像处理方法、装置、设备及存储介质
CN112884782B (zh) 生物对象分割方法、装置、计算机设备和存储介质
CN113439227A (zh) 放大图像的捕获和存储
WO2021097595A1 (zh) 图像的病变区域分割方法、装置及服务器
CN107993239B (zh) 一种计算单目图像的深度次序的方法和装置
CN112884725B (zh) 针对用于细胞判别的神经网络模型输出结果的修正方法
CN111062984B (zh) 视频图像区域面积的测量方法、装置、设备及存储介质
CN112330671A (zh) 细胞分布状态的分析方法、装置、计算机设备和存储介质
CN114926876A (zh) 图像关键点检测方法、装置、计算机设备和存储介质
CN112613379A (zh) 年龄估计方法及装置、电子设备及计算机可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17880157

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17880157

Country of ref document: EP

Kind code of ref document: A1