CN113409331B - Image processing method, image processing device, terminal and readable storage medium - Google Patents

Image processing method, image processing device, terminal and readable storage medium Download PDF

Info

Publication number
CN113409331B
CN113409331B CN202110636329.7A CN202110636329A CN113409331B CN 113409331 B CN113409331 B CN 113409331B CN 202110636329 A CN202110636329 A CN 202110636329A CN 113409331 B CN113409331 B CN 113409331B
Authority
CN
China
Prior art keywords
image
probability
depth
value
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110636329.7A
Other languages
Chinese (zh)
Other versions
CN113409331A (en
Inventor
戴夏强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Oppo Mobile Telecommunications Corp Ltd
Original Assignee
Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Oppo Mobile Telecommunications Corp Ltd filed Critical Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority to CN202110636329.7A priority Critical patent/CN113409331B/en
Publication of CN113409331A publication Critical patent/CN113409331A/en
Application granted granted Critical
Publication of CN113409331B publication Critical patent/CN113409331B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses an image processing method, an image processing device, a terminal and a nonvolatile computer readable storage medium. The image processing method comprises the steps of carrying out human image segmentation processing on an obtained original image to obtain an initial segmented image and a first probability image, wherein the initial segmented image comprises a human image area and a background area, the first probability image comprises a first probability value of each pixel in the initial segmented image, and the first probability value represents the probability of each pixel in the initial segmented image in the human image area; acquiring a depth image and a second probability image, wherein the depth image is used for indicating the depth value of each pixel in the original image, and the second probability image comprises a second probability value corresponding to each pixel in the depth image; and acquiring an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image. Compared with the method for directly adopting a single segmentation model to segment, the method can avoid the problems of false detection, omission and the like in a complex scene.

Description

Image processing method, image processing device, terminal and readable storage medium
Technical Field
The present invention relates to the field of image technology, and in particular, to an image processing method, an image processing device, a terminal, and a readable storage medium.
Background
At present, a semantic segmentation or a matching matting mode is generally adopted to acquire a portrait region in an image, and features are integrated to classify a foreground and a background no matter the semantic segmentation or the matching matting mode is adopted. However, if the scene of the image is complex, it is difficult to distinguish whether some features belong to the foreground or the background, so that the problems of false detection and omission are easy to generate.
Disclosure of Invention
The embodiment of the application provides an image processing method, an image processing device, a terminal and a nonvolatile computer readable storage medium.
The embodiment of the application provides an image processing method. The image processing method comprises the steps of carrying out image segmentation processing on an obtained original image to obtain an initial segmented image and a first probability image, wherein the initial segmented image comprises a portrait area and a background area, the first probability image comprises a first probability value of each pixel in the initial segmented image, and the first probability value represents the probability of each pixel in the initial segmented image in the portrait area; acquiring a depth image and a second probability image, wherein the depth image is used for indicating the depth value of each pixel in the original image, and the second probability image comprises a second probability value corresponding to each pixel in the depth image; and acquiring an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image.
The embodiment of the application also provides an image processing device. The image processing device comprises a first acquisition module, a second acquisition module and a fusion module. The first acquisition module is used for carrying out portrait segmentation processing on an acquired original image so as to obtain an initial segmented image and a first probability image, wherein the initial segmented image comprises a portrait area and a background area, the first probability image comprises a first probability value of each pixel in the initial segmented image, and the first probability value represents the probability of each pixel in the initial segmented image in the portrait area. The second acquisition module is used for acquiring a depth image and a second probability image, the depth image is used for indicating the depth value of each pixel in the original image, and the second probability image comprises a second probability value corresponding to each pixel in the depth image. The fusion module is used for acquiring an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image.
The embodiment of the application also provides a terminal. The terminal includes a housing and one or more processors. One or more of the processors are coupled to the housing. The one or more processors are used for carrying out portrait segmentation processing on the acquired original image so as to obtain an initial segmented image and a first probability image, wherein the initial segmented image comprises a portrait area and a background area, the first probability image comprises a first probability value of each pixel in the initial segmented image, and the first probability value represents the probability of each pixel in the initial segmented image in the portrait area; acquiring a depth image and a second probability image, wherein the depth image is used for indicating the depth value of each pixel in the original image, and the second probability image comprises a second probability value corresponding to each pixel in the depth image; and acquiring an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image.
Embodiments of the present application also provide a non-transitory computer readable storage medium containing a computer program. The computer program, when executed by a processor, causes the processor to perform the image processing method described below. The image processing method comprises the steps of carrying out image segmentation processing on an obtained original image to obtain an initial segmented image and a first probability image, wherein the initial segmented image comprises a portrait area and a background area, the first probability image comprises a first probability value of each pixel in the initial segmented image, and the first probability value represents the probability of each pixel in the initial segmented image in the portrait area; acquiring a depth image and a second probability image, wherein the depth image is used for indicating the depth value of each pixel in the original image, and the second probability image comprises a second probability value corresponding to each pixel in the depth image; and acquiring an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image.
According to the image processing method, the image processing device, the terminal and the non-volatile computer readable storage medium, the target segmentation image is obtained according to the initial segmentation image, the first probability image, the depth image and the second probability image, so that the problems of false detection, omission and the like in a complex scene can be avoided compared with the method for directly segmenting the original image by adopting a single segmentation model, and the stability and the accuracy of human image region segmentation are improved.
Additional aspects and advantages of embodiments of the application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the application.
Drawings
The foregoing and/or additional aspects and advantages of the present application will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings, in which:
FIG. 1 is a flow diagram of an image processing method in some embodiments of the present application;
FIG. 2 is a schematic diagram of an image processing apparatus in some embodiments of the present application;
FIG. 3 is a schematic diagram of a terminal in some embodiments of the present application;
FIGS. 4-5 are flow diagrams of image processing methods in certain embodiments of the present application;
FIG. 6 is a schematic illustration of an original image in a cross-sectional view in some embodiments of the present application;
FIG. 7 is a schematic illustration of an original image in a portrait view in some embodiments of the present application;
FIG. 8 is a schematic diagram of rotation of a landscape into a portrait in some embodiments of the present application;
FIGS. 9-11 are schematic illustrations of segmentation of a preprocessed image by a segmentation model in some embodiments of the present application;
FIG. 12 is a flow chart of an image processing method in some embodiments of the present application;
FIG. 13 is a schematic diagram of a depth estimation network model in some embodiments of the present application;
FIG. 14 is a schematic diagram of a monocular depth estimation network model in some embodiments of the present application;
FIG. 15 is a schematic diagram of a binocular depth estimation network model in certain embodiments of the present application;
FIGS. 16-17 are flow diagrams of image processing methods in certain embodiments of the present application;
FIG. 18 is a schematic diagram of a first image processing of an original depth information image from an original segmentation image in some embodiments of the present application;
FIGS. 19-20 are flow diagrams of image processing methods in certain embodiments of the present application;
FIG. 21 is a schematic diagram of generating a segmented image of a target in some embodiments of the present application;
FIGS. 22-23 are flow diagrams of image processing methods in certain embodiments of the present application;
FIG. 24 is a schematic illustration of interactions of a non-volatile computer-readable storage medium with a processor in certain embodiments of the present application.
Detailed Description
Embodiments of the present application are described in detail below, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to like or similar elements or elements having like or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the embodiments of the present application and are not to be construed as limiting the embodiments of the present application.
Referring to fig. 1, an embodiment of the present application provides an image processing method. The image processing method comprises the following steps:
01: performing human image segmentation processing on the obtained original image to obtain an initial segmented image and a first probability image, wherein the initial segmented image comprises a human image area and a background area, the first probability image comprises a first probability value of each pixel in the initial segmented image, and the first probability value represents the probability of each pixel in the initial segmented image in the human image area;
02: acquiring a depth image and a second probability image, wherein the depth image is used for indicating the depth value of each pixel in the original image, and the second probability image comprises a second probability value corresponding to each pixel in the depth image; and
03: and acquiring an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image.
Referring to fig. 2, the embodiment of the present application further provides an image processing apparatus 10. The image processing apparatus 10 includes a first acquisition module 11, a second acquisition module 12, and a fusion module 13. Step 01 in the above image processing method may be implemented by the first acquisition module 11; step 02 may be performed by the second acquisition module 12; step 03 may be performed by the fusion module 13. That is, the first obtaining module 11 is configured to perform a portrait segmentation process on an obtained original image to obtain an initial segmented image and a first probability image, where the initial segmented image includes a portrait area and a background area, and the first probability image includes a first probability value of each pixel in the initial segmented image, and the first probability value characterizes a probability of each pixel in the initial segmented image in the portrait area. The second obtaining module 12 is configured to obtain a depth image and a second probability image, where the depth image is used to indicate a depth value of each pixel in the original image, and the second probability image includes a second probability value corresponding to each pixel in the depth image. The fusion module 13 is configured to obtain an image object segmentation image according to the initial segmentation image, the first probability image, the depth image, and the second probability image.
Referring to fig. 3, the embodiment of the present application further provides a terminal 100. The terminal 100 includes a housing 20 and one or more processors 30, the one or more processors 30 being associated with the housing 20. Steps 01, 02 and 03 of the above image processing method may also be performed by one or more processors 20. That is, the one or more processors 30 are configured to perform a portrait segmentation process on the acquired original image to obtain an initial segmented image and a first probability image, where the initial segmented image includes a portrait region and a background region, and the first probability image includes a first probability value of each pixel in the initial segmented image, where the first probability value characterizes a probability of each pixel in the initial segmented image in the portrait region; acquiring a depth image and a second probability image, wherein the depth image is used for indicating the depth value of each pixel in the original image, and the second probability image comprises a second probability value corresponding to each pixel in the depth image; and acquiring an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image. It should be noted that, the terminal 100 may be a mobile phone, a camera, a notebook computer, an intelligent wearable device, etc., and the embodiments below only use the terminal 100 as the mobile phone.
According to the image processing method, the image processing device 10 and the terminal 100 in the embodiment of the application, the target segmentation image is obtained according to the initial segmentation image, the first probability image, the depth image and the second probability image, so that the problems of false detection, missed detection and the like in a complex scene can be avoided compared with the case that a single segmentation model is directly adopted for segmenting the original image, and the stability and the accuracy of human image region segmentation are improved.
Wherein, in one example, the terminal 100 or the image processing apparatus 10 may further include an imaging module 40, and the terminal 100 or the image processing apparatus 10 may perform image acquisition on the person through the imaging module 40 to obtain an original image; in another example, the terminal 100 or the image processing apparatus 10 may further include a storage module 50, in which an original image including a portrait is stored in advance in the storage module 50, and the processor 30 of the terminal 100 may acquire the original image from the storage module 50; in still another example, the terminal 100 or the image processing apparatus 10 may acquire an original image containing a portrait through an input of a user. The specific method for obtaining the original image is not limited herein, and the obtained original image needs to contain a portrait.
After the original image is acquired, the processor 30 (or the first acquisition module 11) performs a portrait segmentation process on the acquired original image to acquire an initial segmentation image and a first probability image. Specifically, referring to fig. 1 and 4, in some embodiments, step 01: performing image segmentation processing on the obtained original image to obtain an initial segmented image and a first probability image, including:
011: preprocessing an original image to obtain a preprocessed image;
012: and inputting the preprocessed image into a preset segmentation model to acquire an initial segmentation image and a first probability image.
Referring to fig. 2, in some embodiments, the steps 011 and 012 may be performed by the first obtaining module 11 of the image processing apparatus 10. That is, the first obtaining module 11 is further configured to perform preprocessing on the original image to obtain a preprocessed image; and inputting the preprocessed image into a preset segmentation model to obtain an initial segmentation image and a first probability image.
Referring to fig. 3, in some embodiments, the steps 011 and 012 may be performed by one or more processors 30 of the terminal 100. That is, the one or more processors 30 are also configured to pre-process the original image to obtain a pre-processed image; and inputting the preprocessed image into a preset segmentation model to obtain an initial segmentation image and a first probability image.
For example, in some embodiments, the image input to the preset segmentation model needs to meet the requirements of the input image, that is, the preset segmentation model may have some requirements on the properties of the input image, and the input image should meet these requirements, so that the preset segmentation model can process correctly. Therefore, after the original image containing the portrait is acquired, the original image is preprocessed to acquire a preprocessed image, and the preprocessed image can meet the requirement of a preset segmentation model on the input image. Thus, after the preprocessed image is input into the preset segmentation model, the preset segmentation model can accurately process the preprocessed image. Specifically, referring to fig. 4 and 5, in some embodiments, step 011: preprocessing an original image to obtain a preprocessed image, including:
0111: detecting whether an original image is a horizontal shot image or a vertical shot image;
0112: if the original image is a horizontal shot image, rotating the original image to change the original image into a vertical shot image; performing normalization processing on the rotated original image to obtain a preprocessed image;
0113: if the original image is a portrait image, normalization processing is performed on the original image to obtain a preprocessed image.
Referring to fig. 2, in some embodiments, the steps 0111, 0112, and 0113 may be performed by the first obtaining module 11 of the image processing apparatus 10. That is, the first obtaining module 11 is further configured to detect whether the original image is a horizontal shot image or a vertical shot image; if the original image is a horizontal shot image, rotating the original image to change the original image into a vertical shot image; performing normalization processing on the rotated original image to obtain a preprocessed image; and if the original image is a vertically shot image, performing normalization processing on the original image to obtain a preprocessed image.
Referring to fig. 3, in some embodiments, the steps 0111, 0112, and 0113 may be implemented by one or more processors 30 of the terminal 100. That is, the one or more processors 30 are also configured to detect whether the original image is a landscape or portrait image; if the original image is a horizontal shot image, rotating the original image to change the original image into a vertical shot image; performing normalization processing on the rotated original image to obtain a preprocessed image; and if the original image is a vertically shot image, performing normalization processing on the original image to obtain a preprocessed image.
For example, after an original image containing a portrait is acquired, it is detected whether the original image is a landscape or portrait image. Note that, if the terminal 100 or the image processing apparatus 10 is in the landscape mode when capturing the current original image, the current original image is a landscape image; if the terminal 100 or the image processing apparatus 10 is in the portrait mode when the current original image is photographed, the current original image is a portrait image. In some embodiments, whether the original image is a landscape or portrait image may be detected by the width and height of the original image. For example, taking the terminal 100 as an example to take a current original image. As shown in fig. 6 and 7, the terminal 100 includes adjacent first and second sides 101 and 102, and the length of the first side 101 is longer than the length of the second side 102. The first side 101 is the long side of the terminal 100 and the second side 102 is the wide side of the terminal 100. The width w and the height h of the original image are obtained, wherein the length of the side of the original image parallel to the long side of the terminal 100 is the height h of the original image, and the length of the side of the original image parallel to the wide side of the terminal 100 is the width w of the original image. If the width w of the original image is greater than the height h (as shown in fig. 6), the original image is a cross-shot image; if the width w of the original image is smaller than the height h (as shown in fig. 7), the original image is a vertically photographed image.
Referring to fig. 8, when it is determined that the current image is a horizontal shot image, the original image is rotated to be a vertical shot image, and normalization processing is performed on the rotated original image to obtain a preprocessed image, so that a preset segmentation model is beneficial to correctly processing the preprocessor image. For example, in one example, normalization may be performed by dividing the pixel values of all pixels in the rotated original image by 255; in another example, the difference of the pixel values of all pixels in the rotated original image minus 127.5 may also be normalized by dividing by 127.5, which is not limited herein. Further, in some embodiments, after the horizontal shot image as the original image is rotated to the vertical shot image, the rotated original image is scaled to a preset size, and then the scaled original image is normalized. Wherein the preset size is the size of the input image required by the preset segmentation model.
When the current image is determined to be a vertically shot image, normalization processing is directly performed on the original image to obtain a preprocessed image, so that the preprocessing image can be correctly processed by the preset segmentation model. The specific manner of normalization processing is the same as that of normalization processing of the rotated original image, and will not be described here. Of course, in some embodiments, the zoomed original image may be normalized after the zoomed original image is zoomed to a preset size.
Referring to fig. 9, after obtaining the preprocessed image, the processor 30 (or the first obtaining module 11) inputs the preprocessed image into a preset segmentation model to obtain an initial segmentation image and a first probability image. Wherein the initial segmented image includes a portrait region (e.g., a white region of the initial segmented image in fig. 9) and a background region (e.g., a black region of the initial segmented image in fig. 9), the first probability image includes a first probability value I1 for each pixel in the initial segmented image, and the first probability value I1 characterizes a probability of each pixel in the initial segmented image being in the portrait region. That is, if the first probability value I1 corresponding to a certain pixel in the initially divided image is larger, the probability that the pixel is in the portrait area is higher; similarly, if the first probability value I1 corresponding to a certain pixel in the initially divided image is smaller, the probability that the pixel is in the portrait area is smaller. In particular, in some embodiments, the initial segmented image includes a segmentation value for each pixel, and the initial segmented image includes a segmentation value for each pixel in the range of 0 to 1. If the segmentation value in the initial segmentation image is larger than a first preset segmentation value, the region where the pixel point is located is considered to be a portrait region; if the segmentation value in the initial segmentation image is not greater than the first preset segmentation value, the area where the pixel point is located is considered as a background area.
Referring to fig. 10, in one embodiment, the preset segmentation model includes an encoding module and a decoding module, where the encoding module can convolve and pool the preprocessed image input into the preset segmentation model multiple times to obtain a feature image, where the feature image includes portrait feature information. The decoding module is used for acquiring an initial segmentation image containing a portrait area and a first probability image according to portrait characteristic information in the characteristic image. Specifically, referring to fig. 11, in some embodiments, the preset segmentation model further includes two parts, namely Semantic segmentation (semanteme segmentation) and Detail segmentation (Detail segmentation), where the Detail segmentation (Detail segmentation) is used to extract micro-features of the preprocessed image, the Semantic segmentation (semanteme segmentation) is used to extract macro-features of the preprocessed image, and after the micro-features are fused with the micro-features, an initial segmentation image including a portrait region and a first probability image are acquired according to the fused features. The preset segmentation model integrates the features of different levels (micro features and macro features) so as to realize the requirements of coarse segmentation and fine segmentation, thereby being beneficial to improving the accuracy of segmenting images by the preset segmentation model. It should be noted that, in some embodiments, the preset segmentation model may adopt an MODNet network configuration or a Spectral imaging network configuration, and of course, the preset segmentation model may also adopt other network configurations, which is only required to be capable of acquiring the initial segmentation image including the portrait region and the first probability image, which is not limited herein.
Furthermore, in some embodiments, the image processing method further comprises iteratively training the initial segmentation model to obtain a preset segmentation model based on a sample image set comprising a plurality of sample images. Specifically, a portrait region is marked in the sample image, and the sample image is input into the initial segmentation model to obtain a training image containing the portrait region. And acquiring the value of the loss function of the initial segmentation model according to the difference between the portrait region of the training image and the portrait region marked in the sample image. After obtaining the value of the loss function of the initial model and the initial model, the initial model can be iteratively trained according to the value of the loss function to obtain the segmentation model. In some embodiments, the Adam optimizer may be used to iteratively train the initial model according to the loss function until the loss value of the output result of the initial model converges, and save the model at this time as a trained segmentation model. The Adam optimizer combines the advantages of two optimization algorithms, namely AdaGra (Adaptive Gradient ) and RMSProp, and comprehensively considers the first moment estimation (First Moment Estimation, i.e., the average value of the gradient) and the second moment estimation (Second Moment Estimation, i.e., the non-centralized variance of the gradient) of the gradient to calculate an update step.
It should be noted that, the termination conditions of the iterative training may include: the number of iterative training reaches the target number; or the loss value of the output result of the initial model satisfies the set convergence condition. In one example, the convergence condition is to make the total loss value as small as possible, and the initial learning rate 1e-3 is used, the learning rate decays with the cosine of the step number, the batch_size=8, and after training 16 epochs, the convergence can be considered to be completed. The batch_size is understood as a batch parameter, the limit value of the batch_size is the total number of samples in the training set, the epoch refers to the number of times of training by using all samples in the training set, and the value of the epoch is colloquially that the whole data set is looped for several times, and 1 epoch is equal to 1 time of training by using all samples in the training set. In another example, the loss value satisfying the set convergence condition may include: the total Loss value Loss is smaller than the set threshold. Of course, specific setting conditions are not limited.
In some embodiments, the trained segmentation model may be stored locally in the terminal 100 (or the image processing apparatus 10), for example, in the storage module 50, or may be stored in a server communicatively connected to the terminal 100 (or the image processing apparatus 10), so that the storage space of the terminal 100 (or the image processing apparatus 10) is reduced, and the operation efficiency of the terminal 100 (or the image processing apparatus 10) is improved. Of course, in some embodiments, the segmentation model may also acquire new training data periodically or aperiodically, and train and update the segmentation model. For example, when there is a person image that is mistakenly segmented, the person image may be used as a sample image, and after the sample image is labeled, the training is performed again by the training method, so that the accuracy of the segmentation model may be improved.
In some embodiments, the original image may be input into a preset depth estimation network model, so that a depth image including the depth information of the initial segmentation image and the second probability image can be directly obtained. Specifically, referring to fig. 1 and 12, in some embodiments, the second probability value characterizes a probability that a depth value of each pixel in the depth image is a corresponding depth value, step 02: acquiring a depth image and a second probability image, including:
021: preprocessing an original image to obtain a preprocessed image; and
022: the preprocessed image is input to the depth estimation network model to obtain a depth image and a second probability image.
Referring to fig. 2, in some embodiments, both the steps 021 and 022 may be performed by the second obtaining module 12 of the image processing apparatus 10. That is, the second obtaining module 12 is further configured to perform preprocessing on the original image to obtain a preprocessed image; and inputting the preprocessed image into the depth estimation network model to obtain a depth image and a second probability image.
Referring to fig. 3, in some embodiments, both steps 021 and 022 may be implemented by one or more processors 30 of the terminal 100. That is, the one or more processors 30 are also configured to pre-process the original image to obtain a pre-processed image; and inputting the preprocessed image into the depth estimation network model to obtain a depth image and a second probability image.
Likewise, in some embodiments, the image input to the preset depth estimation network model needs to meet the requirements of the input image, that is, the preset depth estimation network model may have some requirements on the properties of the input image, and the input image should meet these requirements, so that the preset depth estimation network model can process correctly. Therefore, before the original image is input into the preset depth estimation network model, the original image needs to be preprocessed, and a specific implementation manner of preprocessing the original image to obtain the preprocessed image is the same as that of preprocessing the original image to obtain the preprocessed image in the above embodiment, which is not described herein. It should be noted that, in order to enable the depth image output by the depth estimation network model to correspond to the initial segmentation image, in some embodiments, the preprocessing performed before the original image enters the segmentation model is identical to the preprocessing performed before the original image enters the depth estimation network model. Specifically, in some embodiments, the original image may be preprocessed to obtain a preprocessed image, and then the preprocessed image may be respectively input into the segmentation model and the depth estimation network model for processing. Therefore, the depth image output by the depth estimation network model can be enabled to correspond to the initial segmentation image, and the image processing speed can be increased without carrying out preprocessing on the original image twice.
Referring to fig. 13, after the preprocessed image is obtained, the preprocessed image is input into a preset depth estimation network model to obtain a depth image and a second probability image. The depth image indicates the depth value of each pixel point in the original image, the second probability image comprises a second probability value I2 corresponding to each pixel in the depth image, and the second probability value I2 represents the probability that the depth value of each pixel in the depth image is a corresponding value. For example, if the depth value of the pixel point located in the 1 st row and 1 st column in the depth image is 0.5 and the second probability value I2 located in the 1 st row and 1 st column in the second probability image is 80%, the probability that the depth value of the pixel point located in the 1 st row and 1 st column in the depth image is 0.5 is 80%.
Referring to fig. 14, in some embodiments, the predetermined depth network estimation model may be a monocular depth estimation network. The monocular depth estimation network is a trained network obtained by performing supervised learning through a large amount of training data in advance, and can output a depth image and a second probability image corresponding to the preprocessed image, that is, can output the depth image and the second probability image corresponding to the initial segmentation image. The monocular depth estimation network may be derived using a deep learning type algorithm, such as CNN (Convolutional Neural Networks, convolutional neural network), U-Ne algorithm, FCN (Fully Convolutional Neural Networks, full convolutional neural network), and the like. In one embodiment, the monocular depth estimation network includes an Encoder encoding module and a Decoder decoding module, where the Encoder encoding module uses a backup implementation, such as Mobilenet, resnet, vgg, and the encoding module is a feature extraction module, configured to perform operations such as convolution, activation, and pooling on the processed image to extract features of the input image. The decoding module is used for carrying out convolution, activation, softmax calculation and other processing on the image characteristics to obtain a depth image and second probability images, wherein each second probability value I2 in the second probability images represents the probability that the depth value of each pixel in the depth image is a corresponding value. As shown in fig. 14, to obtain a network structure schematic image with monocular depth estimation network depth information, the preprocessed image is input into a codec-decoder type codec, i.e., monocular depth estimation network, to obtain a depth image and a second probability image.
Referring to fig. 15, in some embodiments, the predetermined depth network estimation model may be a binocular depth estimation network. The imaging module 40 includes left and right cameras at this time, and the left and right cameras can separately acquire images. Similarly, the binocular depth estimation network is a trained network obtained by preselecting a large amount of training data for supervised learning, and can output a depth image and a second probability image corresponding to the preprocessed image, that is, can output a depth image and a second probability image corresponding to the initial segmentation image. It should be noted that, the original image may be a left image (acquired by a left camera) or a right image (acquired by a right camera), which is not limited herein. In one embodiment, as shown in fig. 15, the binocular depth estimation network includes two branches, one of which is a computation cost volume (cost volume branch) and the other of which is a volume filtering (feature filtering). The cameras on the left side and the right side collect a left image and a right image at the same time, and preprocessing the left image and the right image to obtain a preprocessed left image and a preprocessed right image. Outputting the preprocessed left image and the preprocessed right image into a binocular depth estimation network, enabling the preprocessed left image and the preprocessed right image to enter a computing cost volume (cost volume branch) network branch for convolution, batch normalization (Batch Normalization), activation and other processing, and then carrying out convolution through a convolution layer (convolution layer) to obtain a first result, and enabling the preprocessed right image to enter a volume filtering (feature filtering) network branch for multiple convolutions to be activated to obtain a second result. After the first result and the second result are obtained, the first result and the second result are input into a joint filter (joint filter) to be convolved and activated, and the obtained results are calculated through three convolutions and soft argmax to obtain a depth image. And inputting the first result and the second result into a joint filter (joint filter) to perform convolution and activation, and processing the obtained results through convolution, softmax calculation and the like to obtain second probability images, wherein each second probability value I2 in the second probability images represents the probability that the depth value of each pixel in the depth image is a corresponding value.
Referring to fig. 1 and 16, in some embodiments, the image processing method further includes:
04: acquiring an original depth information image by adopting a depth information acquisition module
At this time, step 02: acquiring the depth image and the second probability image, further comprising:
023: performing first image processing on the original depth information image according to the initial segmentation image to obtain a depth image corresponding to the initial segmentation image; and
024: and acquiring a corresponding second probability value according to the depth value of each pixel in the depth image so as to acquire a second probability image.
Referring to fig. 2, in some embodiments, the image processing apparatus 10 further includes a depth information acquisition module 60, and the depth information acquisition module 60 acquires an original depth information image. Steps 023 and 024 may each be implemented by execution of the second acquisition module 12 of the image processing apparatus 10. That is, the second obtaining module 12 is further configured to perform a first image processing on the original depth information image according to the initial segmentation image to obtain a depth image corresponding to the initial segmentation image; and acquiring a corresponding second probability value according to the depth value of each pixel in the depth image so as to acquire a second probability image.
Referring to fig. 3, in some embodiments, the terminal 100 further includes a depth information acquisition module 60, and the depth information acquisition module 60 acquires an original depth information image. Steps 023 and 024 may each be implemented by one or more processors 30 of terminal 100. That is, the one or more processors 30 are further configured to perform a first image processing on the original depth information image according to the initial segmentation image to obtain a depth image corresponding to the initial segmentation image; and acquiring a corresponding second probability value according to the depth value of each pixel in the depth image so as to acquire a second probability image.
It should be noted that, in some embodiments, the depth information acquisition module 60 and the imaging module 40 acquire the original depth information image and the original image at the same time; or, the time between the acquisition of the original depth information image by the depth information acquisition module 60 and the acquisition of the original image by the imaging module 40 is smaller than the preset time difference, so that the situation that the time difference between the acquisition of the original depth information image by the depth information acquisition module 60 and the acquisition of the original image by the imaging module 40 is larger can be avoided, and the image information in the original depth information image is not matched with the image information of the original image.
Furthermore, in one example, the depth information acquisition module 60 may be a time of flight module (TOF module). Illustratively, the depth acquisition module 60 includes a light emitter for emitting infrared light to the photographed object, and a light receiver for receiving the infrared light reflected by the photographed object, and obtaining an original depth information image according to a time difference or a phase difference between the emitted infrared light and the received reflected infrared light. In another example, the depth information acquisition module 60 may also be a structured light module. Illustratively, the depth acquisition module 60 includes a structured light projector for projecting a laser image pattern onto a photographed object, and a structured light camera for acquiring a reflected laser image pattern of the photographed object and acquiring an original depth information image according to the received laser image pattern. Of course, other ways may be used to obtain the original depth information image, for example, a binocular camera is used to obtain two frames of images, and the original depth information image is obtained according to the parallax between the images obtained by the left and right cameras, which is not limited herein.
The original depth information image corresponds to the original image, the original depth information image comprises a depth value of each pixel point in the image, the original image is subjected to pretreatment and human image segmentation treatment to obtain an initial segmentation image, and if the original depth information image is required to obtain the depth image corresponding to the initial segmentation image, the first image treatment is required to be carried out on the original depth information image. Specifically, referring to fig. 16 and 17, in some embodiments, step 023: performing first image processing on the original depth information image according to the initial segmentation image to obtain a depth image corresponding to the initial segmentation image, including:
0231: performing image alignment processing on the original depth information image to obtain an original depth information image aligned with the initial segmentation image; and
0232: and performing interpolation scaling and Gaussian blur processing on the aligned original depth information image to obtain a depth image corresponding to the initial segmentation image.
Referring to fig. 2, in some embodiments, both the steps 0231 and 0232 may be performed by the second obtaining module 12 of the image processing apparatus 10. That is, the second obtaining module 12 is further configured to perform image alignment processing on the original depth information image to obtain an original depth information image aligned with the initial segmentation image; and performing interpolation scaling and Gaussian blur processing on the aligned original depth information image to obtain a depth image corresponding to the initial segmentation image.
Referring to fig. 3, in some embodiments, both steps 0231 and 0232 may be implemented by one or more processors 30 of the terminal 100. That is, the one or more processors 30 are further configured to perform an image alignment process on the original depth information image to obtain an original depth information image aligned with the initial segmentation image; and performing interpolation scaling and Gaussian blur processing on the aligned original depth information image to obtain a depth image corresponding to the initial segmentation image.
Because the original depth information image and the original image are not acquired by the same imaging device, the coordinate system corresponding to the original depth information image is not the same as the coordinate system corresponding to the original image, i.e. the coordinate system corresponding to the original depth information image is not the same as the coordinate system corresponding to the original segmentation image. After the initial segmentation image and the original depth information image are acquired, the original depth information image needs to be subjected to image alignment processing so as to acquire the original depth information image aligned with the initial segmentation image, and a coordinate system corresponding to the aligned original depth information image is the same as a coordinate system corresponding to the initial segmentation image. Specifically, in some embodiments, the pixel coordinates of the pixel points in the original depth information image are converted into the coordinates in the world coordinate system according to the internal parameters and the external parameters of the depth camera (the camera for acquiring the original depth information image), and then the coordinates in the world coordinate system are converted into the pixel coordinates in the original image coordinate system according to the internal parameters and the external parameters of the color camera (the camera for acquiring the original image), that is, the coordinates in the world coordinate system are converted into the pixel coordinates in the initial segmentation image coordinate system. For example, firstly, calculating the three-dimensional point coordinate of a pixel point in an original depth information image under a depth image coordinate system according to the two-dimensional point coordinate of the pixel point in the original depth information image under the depth image coordinate system and the internal reference of a depth camera (a camera for acquiring the original depth information image); then calculating the coordinates of the pixel point in the world coordinate system according to the three-dimensional point coordinates of the pixel point in the depth camera coordinate system and the external parameter matrix converted from the depth camera coordinate system to the world coordinate system; then, according to the coordinate of the point under the world coordinate system and the external parameter matrix converted from the world coordinate system to the coordinate system of the color camera (the camera for acquiring the original image), calculating the three-dimensional point coordinate of the point under the coordinate system of the color camera; and finally, calculating the pixel coordinates of the point under the color camera coordinate system according to the three-dimensional point coordinates under the color camera coordinate system and the internal reference matrix of the color camera. And performing the coordinate conversion on all pixel points in the original depth information image to obtain an aligned original depth information image, wherein a coordinate system corresponding to the aligned original depth information image is the same as a coordinate system corresponding to the original image, namely, the coordinate system corresponding to the aligned original depth information image is the same as a coordinate system corresponding to the initial segmentation image. Of course, in some embodiments, other embodiments may also be used to perform image alignment processing on the original depth information image to obtain the original depth information image aligned with the original segmentation image, which is not limited herein.
Referring to fig. 18, after the aligned original depth information image is obtained, although the coordinate system corresponding to the aligned original depth information image is the same as the coordinate system corresponding to the initial segmentation image, since the initial segmentation image is obtained by preprocessing the original image and then performing segmentation processing, the size of the initial segmentation image may be different from the size of the aligned original depth information image. Therefore, interpolation scaling processing is required to be performed on the aligned original depth information image to obtain a scaled original depth information image, and the size of the scaled original depth information image corresponds to the size of the initial divided image. Specifically, in some embodiments, a bilinear interpolation algorithm may be used to perform interpolation scaling on the aligned original depth information image. The bilinear interpolation algorithm is a better image scaling algorithm, and fully utilizes the depth values of four actually existing pixels around a virtual point in the aligned original depth information image to jointly determine a depth value in the scaled original depth information image. Of course, in some embodiments, the aligned original depth information image may also be interpolated and scaled by a nearest neighbor interpolation algorithm to obtain a scaled original depth information image corresponding to the size of the initial segmented image, which is not limited herein.
Referring to fig. 18, in some embodiments, after a scaled original depth information image corresponding to a size of an initial segmentation image is acquired, a gaussian blur process is performed on the scaled original depth information image to obtain a depth image corresponding to the initial segmentation image. As Gaussian blur processing is carried out, noise points of the scaled original depth information image can be removed, so that the depth value of each pixel point in the depth image is smoother, and the subsequent image processing is facilitated. Of course, in some embodiments, the scaled original depth information image may be directly used as the depth image without performing gaussian blur processing on the scaled original depth information image, which is not limited herein.
Referring to fig. 16 and 19, in some embodiments, step 024: acquiring a corresponding second probability value according to the depth value of each pixel in the depth image, including:
0241: calculating the ratio between the depth value of each pixel in the depth image and the maximum depth value in the depth image;
0242: and calculating a difference value between the preset value and the ratio to obtain a second probability value of each pixel in the depth image.
Referring to fig. 2, in some embodiments, both the step 0241 and the step 0242 may be performed by the second obtaining module 12 of the image processing apparatus 10. That is, the second obtaining module 12 is further configured to calculate a ratio between the depth value of each pixel in the depth image and the maximum depth value in the depth image; and calculating a difference value between the preset numerical value and the ratio value to obtain a second probability value of each pixel in the depth image.
Referring to fig. 3, in some embodiments, both the step 0241 and the step 0242 may be implemented by one or more processors 30 of the terminal 100. That is, the one or more processors 30 are also configured to calculate a ratio between a depth value for each pixel in the depth image and a maximum depth value in the depth image; and calculating a difference value between the preset numerical value and the ratio value to obtain a second probability value of each pixel in the depth image.
After acquiring a depth image corresponding to the initially segmented image by performing a first image processing on the original depth information image acquired by the hardware (the depth information acquisition module 60), the processor 30 (or the second acquisition module 12) calculates a depth value for each pixel in the depth imageAnd calculating the difference value between the preset value and the ratio value to obtain a second probability value I2 corresponding to each pixel in the depth image. The second probability values corresponding to each pixel are arranged to generate a second probability image. For example, in one embodiment, the preset value is 1, and it is assumed that the depth value of the pixel point located in the 1 st row and 1 st column of the depth image is d 1 The maximum depth value in the depth image is d max The second probability value I2 corresponding to the pixel point is (1-d) 1 /d max ) And the second probability value I2 located in the 1 st row and 1 st column of the second probability image is (1-d) 1 /d max ). Thus, in the depth image, the second probability value I2 of the pixel point with the larger depth value is smaller, that is, the second probability value I2 of the pixel point with the more possibility of being the background area is smaller; the smaller the depth value, the larger the second probability value I2 of the pixel point, i.e. the more likely the second probability value I2 of the pixel point of the portrait area is.
Referring to fig. 1 and 20, in some embodiments, the initial segmented image includes a segmentation value for each pixel, step 03: obtaining an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image, wherein the image target segmentation image comprises the following steps:
031: and obtaining a target segmentation value of the pixel Pi 'j' at a position corresponding to the pixel Pij in the target segmentation image according to the segmentation value of each pixel Pij in the initial segmentation image, the first probability value of the pixel Pij, the depth value of the pixel Pi 'j' corresponding to the pixel Pij in the depth image and the second probability value of the pixel Pi 'j'.
Referring to fig. 2, in some embodiments, step 031 may be performed by the fusion module 13 of the image processing apparatus 10. That is, the fusion module 13 is configured to obtain a target segmentation value of a pixel Pi "j" corresponding to a pixel Pij in the target segmented image according to the segmentation value of each pixel Pij in the initial segmented image, the first probability value of the pixel Pij, the depth value of a pixel Pi 'j' corresponding to the pixel Pij in the depth image, and the second probability value of the pixel Pi 'j'.
Referring to fig. 3, in some embodiments, step 031 may also be performed by one or more processors 30 of terminal 100. That is, the one or more processors 30 are further configured to obtain a target segmentation value of the pixel Pi "j" corresponding to the pixel Pij in the target segmented image according to the segmentation value of each pixel Pij in the initial segmented image, the first probability value of the pixel Pij, the depth value of the pixel Pi 'j' corresponding to the pixel Pij in the depth image, and the second probability value of the pixel Pi 'j'.
It should be noted that, in some embodiments, before calculating the target segmentation value, the depth value of all pixels in the depth image is divided by the maximum depth value, so that the depth value of each pixel in the depth image is also in the range of 0 to 1, which is beneficial to enabling the calculated target segmentation value to be in the range of 0 to 1. The depth values in the depth image described in the following embodiments are processed depth values, that is, the depth values in the depth image described in the following embodiments are all in the range of 0 to 1, which is not described herein.
Referring to fig. 21, after the initial segmentation image, the first probability image, the depth image, and the second probability image are obtained, the processor 30 (or the fusion module 13) obtains a target segmentation value of a pixel Pi "j" corresponding to the pixel Pij in the target segmentation image according to the segmentation value of each pixel Pij in the initial segmentation image, the first probability value of the pixel Pij, the depth value of a pixel Pi 'j' corresponding to the pixel Pij in the depth image, and the second probability value of the pixel Pi 'j'. Specifically, in some embodiments, the formula S may be calculated out (i,j)=I2 (i,j) *(1-d (i,j) )*I1 (i,j) *S in (i, j) calculation, wherein S out (i, j) represents a target division value of a pixel Pi "j" at a position corresponding to the pixel Pij in the target divided image; s is S in (i, j) represents the division value of the pixel Pij in the initial divided image; i1 (i,j) A first probability value representing a pixel Pij; d, d (i,j) A depth value representing a pixel Pi 'j' corresponding to the pixel Pij in the depth image; i2 (i,j) A second probability value representing the pixel Pi 'j'. That is, a value 1 is calculated first with respect to the depth value of the pixel Pi 'j' corresponding to the pixel Pij in the depth imageAnd calculating a target segmentation value of the pixel Pi 'j' at the position corresponding to the pixel Pij in the target segmentation image according to the product among the segmentation value of the pixel Pij, the first probability value of the pixel Pij corresponding to the pixel Pij, the second probability value of the pixel Pi 'j' and the difference. For example, taking the example of calculating the target segmentation value of the 1 st row and 1 st column pixels in the target segmentation image, firstly calculating the difference between the value 1 and the depth value of the 1 st row and 1 st column pixels in the depth image, then calculating the product between the segmentation value of the 1 st row and 1 st column pixels in the initial segmentation image, the first probability value I1 of the 1 st row and 1 st column pixels in the first probability image, the second probability value I2 of the 1 st row and 1 st column pixels in the second probability image, and the difference (the difference between the 1 st and the depth value of the 1 st row and 1 st column pixels in the depth image) and taking the calculated product as the target segmentation value of the 1 st row and 1 st column pixels in the target segmentation image.
Referring to fig. 1 and 22, in some embodiments, the initial segmented image includes a segmentation value for each pixel, step 03: obtaining an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image, wherein the image target segmentation image comprises the following steps:
032: acquiring an intermediate segmentation value of a pixel Pi 'j' at a position corresponding to the pixel Pij in the target segmentation image according to the segmentation value of each pixel Pij in the initial segmentation image, the first probability value of the pixel Pij, the depth value of the pixel Pi 'j' corresponding to the pixel Pij in the depth image and the second probability value of the pixel Pi 'j';
033: the target segmentation value of each pixel Pi 'j' is obtained according to the intermediate segmentation value of each pixel Pi 'j' and the largest intermediate segmentation value in all the pixels Pi 'j'.
Referring to fig. 2, in some embodiments, steps 032 and 033 may be performed by the fusion module 13 of the image processing apparatus 10. That is, the fusion module 13 is further configured to obtain an intermediate segmentation value of the pixel Pi "j" at a position corresponding to the pixel Pij in the target segmented image according to the segmentation value of each pixel Pij in the initial segmented image, the first probability value of the pixel Pij, the depth value of the pixel Pi 'j' corresponding to the pixel Pij in the depth image, and the second probability value of the pixel Pi 'j'; and obtaining a target segmentation value of each pixel Pi 'j' according to the intermediate segmentation value of each pixel Pi 'j' and the largest intermediate segmentation value in all the pixels Pi 'j'.
Referring to fig. 3, in some embodiments, steps 032 and 033 may be implemented by one or more processors 30 of the terminal 100. That is, the one or more processors 30 are further configured to obtain an intermediate segmentation value of the pixel Pi "j" corresponding to the pixel Pij in the target segmented image according to the segmentation value of each pixel Pij in the initial segmented image, the first probability value of the pixel Pij, the depth value of the pixel Pi 'j' corresponding to the pixel Pij in the depth image, and the second probability value of the pixel Pi 'j'; and obtaining a target segmentation value of each pixel Pi 'j' according to the intermediate segmentation value of each pixel Pi 'j' and the largest intermediate segmentation value in all the pixels Pi 'j'.
Referring to fig. 21, after the initial segmentation image, the first probability image, the depth image, and the second probability image are obtained, the processor 30 (or the fusion module 13) obtains an intermediate segmentation value of the pixel Pi "j" corresponding to the pixel Pij in the target segmentation image according to the segmentation value of each pixel Pij in the initial segmentation image, the first probability value of the pixel Pij, the depth value of the pixel Pi 'j' corresponding to the pixel Pij in the depth image, and the second probability value of the pixel Pi 'j'. And the target segmentation value of each pixel Pi 'j' is obtained according to the intermediate segmentation value of each pixel Pi 'j' and the maximum intermediate segmentation value of all the pixels Pi 'j'. In this embodiment, according to the segmentation value of each pixel Pij in the initial segmentation image, the first probability value of the pixel Pij, the depth value of the pixel Pi 'j' corresponding to the pixel Pij in the depth image, and the second probability value of the pixel Pi 'j', after the intermediate segmentation values of the pixels Pi 'j' corresponding to the pixel Pij in the target segmentation image are obtained, all the intermediate segmentation values are divided by the obtained maximum intermediate segmentation value to obtain the target segmentation value, so that compared with the intermediate segmentation value as the target segmentation value (i.e., the intermediate segmentation value is not divided by the obtained maximum intermediate segmentation value and is directly used as the final target segmentation value), the numerical comparison dispersion of the target segmentation value in the target segmentation image can be avoided, which is beneficial to the subsequent differentiation of the portrait region in the target segmentation image.
Specifically, in some embodiments, the formula S may be calculated out (i,j)=[I2 (i,j) *(1-d (i,j) )*I1 (ii,j) *S in (i,j)]Calculated and obtained by/E, wherein S out (i, j) represents a target division value of a pixel Pi "j" at a position corresponding to the pixel Pij in the target divided image; s is S in (i, j) represents the division value of the pixel Pij in the initial divided image; i1 (i,j) A first probability value representing a pixel Pij; d, d (i,j) A depth value representing a pixel Pi 'j' corresponding to the pixel Pij in the depth image; i2 (i,j) The second probability value representing pixel Pi 'j' E is the largest intermediate segmentation value among all pixels Pi "j". That is, a difference between the value 1 and the depth value of the pixel Pi 'j' corresponding to the pixel Pij in the depth image is calculated, and then an intermediate segmentation value of the pixel Pi "j" at the position corresponding to the pixel Pij in the target segmentation image is calculated based on a product among the segmentation value of the pixel Pij, the first probability value of the pixel Pij corresponding to the pixel Pij, the second probability value of the pixel Pi 'j' and the difference. After the intermediate division values of all the pixels are acquired, the intermediate division value of the pixel Pi "j" is divided by the largest intermediate division value among all the pixels Pi "j" of the target divided image to obtain the target division value of the pixel Pi "j". For example, taking the example of calculating the target segmentation value of the 1 st row and 1 st column pixels in the target segmented image, firstly calculating the difference between the value 1 and the depth value of the 1 st row and 1 st column pixels in the depth image, then calculating the product between the segmentation value of the 1 st row and 1 st column pixels in the initial segmented image, the first probability value I1 of the 1 st row and 1 st column pixels in the first probability image, the second probability value I2 of the 1 st row and 1 st column pixels in the second probability image, and the difference (the difference between the 1 st and the depth value of the 1 st row and 1 st column pixels in the depth image), and taking the calculated product as the intermediate segmentation value of the 1 st row and 1 st column pixels in the target segmented image. Intermediate segmentation values for all pixels in the target segmented image are then calculated, assuming a maximum intermediate segmentation value of m. The result obtained by dividing the intermediate division value of the 1 st row and 1 st column pixels in the target divided image by the maximum intermediate division value by m is used as the target division value of the 1 st row and 1 st column pixels in the target divided image.
It should be noted that, in some embodiments, if the segmentation value in the target segmented image is greater than the second preset segmentation value, the region where the pixel point is located is considered to be a portrait region (e.g. a white region in the target segmented image in fig. 21); if the segmentation value in the target segmented image is not greater than the second preset segmentation value, the region where the pixel point is located is considered as the background region (e.g. the black region in the target segmented image in fig. 21). The second preset division value may be the same as or different from the first preset division value in the above embodiment, and is not limited herein.
Because the target segmentation value is obtained by fusing the segmentation value in the initial segmentation image, the first probability value, the depth value of the depth image and the second probability value, compared with the method of directly segmenting the original image by adopting a single segmentation model, the method can avoid the problems of false detection, omission and detection and the like in a complex scene, and improves the stability and the accuracy of human image region segmentation.
Referring to fig. 23, in some embodiments, the object-segmented image includes a portrait region, and the image processing method further includes:
05: and performing second image processing on the original image according to the portrait area in the target segmentation image to acquire a target image.
Referring to fig. 2, in some embodiments, the image processing apparatus 10 further includes a processing module 14, and the step 05 may be performed by the processing module 14. That is, the processing module 14 is configured to perform the second image processing on the original image according to the portrait region in the target divided image to obtain the target image.
Referring to fig. 3, in some embodiments, the step 05 may also be implemented by one or more processors 30 of the terminal 100. That is, the one or more processors 30 are further configured to perform a second image processing on the original image to obtain the target image according to the portrait region in the target divided image.
In some embodiments, after the target divided image including the portrait region is acquired, the original image may be subjected to a second image processing according to the portrait region of the target divided image including the portrait region to acquire the target image. Specifically, in some embodiments, the processor 30 (or processing module 14) obtains the portrait region in the original image from the portrait region of the target-segmented image (i.e., the white region portion in the target-segmented image of fig. 21). Illustratively, in one example, if the original image is scaled while the preprocessing is performed, the target split image is enlarged to the same size as the original image; if the original image is rotated during preprocessing, the target divided image is inversely rotated. For example, when the original image is rotated by 90 ° to the left at the time of preprocessing, the target split image is rotated by 90 ° to the right so that the target split image corresponds to the original image. After the target divided image corresponds to the original image, the position in the original image corresponding to the portrait area in the target divided image (i.e., the white area in the target divided image in fig. 21) is the portrait area in the original image. Of course, other ways may be used to obtain the portrait area in the original image from the portrait area in the target-divided image, which is not limited herein. After the portrait region of the original image is acquired, the processor 30 (or the processing module 14) may perform background blurring processing on a region other than the portrait region in the original image; alternatively, the processor 30 (or the processing module 14) may beautify the portrait area in the original image; still alternatively, the processor 30 (or the processing module 14) may also extract the portrait region in the original image from the original image and place it in another frame of image to generate a new image containing the portrait in the original image.
Referring to fig. 24, the present application also provides a non-transitory computer readable storage medium 400 containing a computer program 410. The computer program, when executed by the processor 420, causes the processor 420 to perform the image processing method of any one of the embodiments described above.
Please refer to fig. 1, which illustrates that when the computer program 410 is executed by the processor 420, the processor 420 is caused to perform the methods of 01, 011, 0111, 0112, 0113, 012, 03, 021, 022, 023, 0231, 0232, 024, 0241, 0242, 03, 031, 032, 033, 04, and 05. For example, the following image processing method is performed
01: performing human image segmentation processing on the obtained original image to obtain an initial segmented image and a first probability image, wherein the initial segmented image comprises a human image area and a background area, the first probability image comprises a first probability value of each pixel in the initial segmented image, and the first probability value represents the probability of each pixel in the initial segmented image in the human image area;
02: acquiring a depth image and a second probability image, wherein the depth image is used for indicating the depth value of each pixel in the original image, and the second probability image comprises a second probability value corresponding to each pixel in the depth image; and
03: and acquiring an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image.
It should be noted that the processor 420 may be disposed in the terminal 100, that is, the processor 420 and the processor 30 are the same processor, and of course, the processor 420 may not be disposed in the terminal 100, that is, the processor 420 and the processor 30 are not the same processor, which is not limited herein.
In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "illustrative embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
Any process or method descriptions in flow charts or otherwise herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present application in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the embodiments of the present application.
While embodiments of the present application have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the present application, and that variations, modifications, alternatives, and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the present application.

Claims (11)

1. An image processing method, comprising:
performing portrait segmentation processing on the obtained original image to obtain an initial segmented image and a first probability image, wherein the initial segmented image comprises a portrait area and a background area, the first probability image comprises a first probability value of each pixel in the initial segmented image, and the first probability value represents the probability of each pixel in the initial segmented image in the portrait area;
Acquiring a depth image and a second probability image, wherein the depth image is used for indicating the depth value of each pixel in the original image, and the second probability image comprises a second probability value corresponding to each pixel in the depth image; and
And acquiring an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image.
2. The image processing method according to claim 1, wherein the performing a human image segmentation process on the acquired original image to obtain an initial segmented image and a first probability image, comprises:
preprocessing the original image to obtain a preprocessed image;
and inputting the preprocessed image into a preset segmentation model to acquire the initial segmentation image and the first probability image.
3. The image processing method according to claim 1, wherein the second probability value characterizes a probability that a depth value of each pixel in the depth image is the corresponding depth value; the obtaining the depth image and the second probability image includes:
preprocessing the original image to obtain a preprocessed image; and
The preprocessed image is input to a depth estimation network model to obtain the depth image and the second probability image.
4. The image processing method according to claim 1, characterized in that the image processing method further comprises:
acquiring an original depth information map by adopting a depth information acquisition module;
the obtaining the depth image and the second probability image includes:
performing first image processing on the original depth information image according to the initial segmentation image to obtain the depth image corresponding to the initial segmentation image; and
And acquiring a corresponding second probability value according to the depth value of each pixel in the depth image so as to acquire the second probability image.
5. The image processing method according to claim 4, wherein the obtaining the corresponding second probability value according to the depth value of each pixel in the depth image includes:
calculating the ratio between the depth value of each pixel in the depth image and the maximum depth value in the depth image;
and calculating a difference value between a preset value and the ratio to acquire a second probability value of each pixel in the depth image.
6. The image processing method according to any one of claims 1 to 5, wherein the initial divided image includes a divided value of each pixel; the obtaining an image object segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image includes:
And obtaining a target segmentation value of a pixel Pi 'j' at a position corresponding to the pixel Pij in the target segmentation image according to the segmentation value of each pixel Pij in the initial segmentation image, the first probability value of the pixel Pij, the depth value of a pixel Pi 'j' corresponding to the pixel Pij in the depth image and the second probability value of the pixel Pi 'j'.
7. The image processing method according to any one of claims 1 to 5, wherein the initial divided image includes a divided value of each pixel; the obtaining an image object segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image includes:
acquiring an intermediate segmentation value of a pixel Pi 'j' at a position corresponding to the pixel Pij in the target segmentation image according to the segmentation value of each pixel Pij in the initial segmentation image, the first probability value of the pixel Pij, the depth value of a pixel Pi 'j' corresponding to the pixel Pij in the depth image and the second probability value of the pixel Pi 'j';
and obtaining a target segmentation value of each pixel Pi 'j' according to the intermediate segmentation value of each pixel Pi 'j' and the largest intermediate segmentation value in all the pixels Pi 'j'.
8. The image processing method according to claim 6, wherein the target divided image includes a portrait area, the image processing method further comprising:
and performing second image processing on the original image according to the portrait area in the target segmentation image so as to acquire a target image.
9. An image processing apparatus, comprising:
the first acquisition module is used for carrying out portrait segmentation processing on an acquired original image so as to obtain an initial segmentation image and a first probability image, wherein the initial segmentation image comprises a portrait area and a background area, the first probability image comprises a first probability value of each pixel in the initial segmentation image, and the first probability value represents the probability of each pixel in the initial segmentation image in the portrait area;
the second acquisition module is used for acquiring a depth image and a second probability image, the depth image is used for indicating the depth value of each pixel in the original image, and the second probability image comprises a second probability value corresponding to each pixel in the depth image; and
And the fusion module is used for acquiring an image target segmentation image according to the initial segmentation image, the first probability image, the depth image and the second probability image.
10. A terminal, comprising:
a housing; and
One or more processors coupled to the housing, the one or more processors configured to perform the image processing method of any of claims 1-8.
11. A non-transitory computer-readable storage medium containing a computer program, characterized in that the computer program, when executed by a processor, causes the processor to perform the image processing method of any one of claims 1 to 8.
CN202110636329.7A 2021-06-08 2021-06-08 Image processing method, image processing device, terminal and readable storage medium Active CN113409331B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110636329.7A CN113409331B (en) 2021-06-08 2021-06-08 Image processing method, image processing device, terminal and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110636329.7A CN113409331B (en) 2021-06-08 2021-06-08 Image processing method, image processing device, terminal and readable storage medium

Publications (2)

Publication Number Publication Date
CN113409331A CN113409331A (en) 2021-09-17
CN113409331B true CN113409331B (en) 2024-04-12

Family

ID=77676946

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110636329.7A Active CN113409331B (en) 2021-06-08 2021-06-08 Image processing method, image processing device, terminal and readable storage medium

Country Status (1)

Country Link
CN (1) CN113409331B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116883479B (en) * 2023-05-29 2023-11-28 杭州飞步科技有限公司 Monocular image depth map generation method, monocular image depth map generation device, monocular image depth map generation equipment and monocular image depth map generation medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521646A (en) * 2011-11-11 2012-06-27 浙江捷尚视觉科技有限公司 Complex scene people counting algorithm based on depth information cluster
CN106651867A (en) * 2017-01-04 2017-05-10 努比亚技术有限公司 Interactive image segmentation method and apparatus, and terminal
WO2019023819A1 (en) * 2017-08-03 2019-02-07 汕头市超声仪器研究所有限公司 Simulated and measured data-based multi-target three-dimensional ultrasound image segmentation method
CN111340864A (en) * 2020-02-26 2020-06-26 浙江大华技术股份有限公司 Monocular estimation-based three-dimensional scene fusion method and device
CN112258618A (en) * 2020-11-04 2021-01-22 中国科学院空天信息创新研究院 Semantic mapping and positioning method based on fusion of prior laser point cloud and depth map
CN112365604A (en) * 2020-11-05 2021-02-12 深圳市中科先见医疗科技有限公司 AR equipment depth of field information application method based on semantic segmentation and SLAM
CN112634296A (en) * 2020-10-12 2021-04-09 深圳大学 RGB-D image semantic segmentation method and terminal for guiding edge information distillation through door mechanism

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102521646A (en) * 2011-11-11 2012-06-27 浙江捷尚视觉科技有限公司 Complex scene people counting algorithm based on depth information cluster
CN106651867A (en) * 2017-01-04 2017-05-10 努比亚技术有限公司 Interactive image segmentation method and apparatus, and terminal
WO2019023819A1 (en) * 2017-08-03 2019-02-07 汕头市超声仪器研究所有限公司 Simulated and measured data-based multi-target three-dimensional ultrasound image segmentation method
CN111340864A (en) * 2020-02-26 2020-06-26 浙江大华技术股份有限公司 Monocular estimation-based three-dimensional scene fusion method and device
CN112634296A (en) * 2020-10-12 2021-04-09 深圳大学 RGB-D image semantic segmentation method and terminal for guiding edge information distillation through door mechanism
CN112258618A (en) * 2020-11-04 2021-01-22 中国科学院空天信息创新研究院 Semantic mapping and positioning method based on fusion of prior laser point cloud and depth map
CN112365604A (en) * 2020-11-05 2021-02-12 深圳市中科先见医疗科技有限公司 AR equipment depth of field information application method based on semantic segmentation and SLAM

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于图像分割的深度视频校正算法;焦任直;陈芬;汪辉;彭宗举;蒋刚毅;;光电子・激光;20160115(第01期);全文 *

Also Published As

Publication number Publication date
CN113409331A (en) 2021-09-17

Similar Documents

Publication Publication Date Title
US10353271B2 (en) Depth estimation method for monocular image based on multi-scale CNN and continuous CRF
Pang et al. Zoom and learn: Generalizing deep stereo matching to novel domains
CN112052831B (en) Method, device and computer storage medium for face detection
GB2553782A (en) Predicting depth from image data using a statistical model
CN106331723B (en) Video frame rate up-conversion method and system based on motion region segmentation
WO2019221013A4 (en) Video stabilization method and apparatus and non-transitory computer-readable medium
CN111402170A (en) Image enhancement method, device, terminal and computer readable storage medium
CN113724379B (en) Three-dimensional reconstruction method and device for fusing image and laser point cloud
CN111723707A (en) Method and device for estimating fixation point based on visual saliency
CN112883940A (en) Silent in-vivo detection method, silent in-vivo detection device, computer equipment and storage medium
KR20150053438A (en) Stereo matching system and method for generating disparity map using the same
Jonna et al. Stereo image de-fencing using smartphones
CN113409331B (en) Image processing method, image processing device, terminal and readable storage medium
CN113610865B (en) Image processing method, device, electronic equipment and computer readable storage medium
WO2015175907A1 (en) Three dimensional moving pictures with a single imager and microfluidic lens
CN108647605B (en) Human eye gaze point extraction method combining global color and local structural features
CN113409329A (en) Image processing method, image processing apparatus, terminal, and readable storage medium
US20180182117A1 (en) Method for Generating Three Dimensional Images
CN112784854A (en) Method, device and equipment for segmenting and extracting clothing color based on mathematical statistics
CN110120009B (en) Background blurring implementation method based on salient object detection and depth estimation algorithm
CN116563104A (en) Image registration method and image stitching method based on particle swarm optimization
CN111275045B (en) Image main body recognition method and device, electronic equipment and medium
CN113487487B (en) Super-resolution reconstruction method and system for heterogeneous stereo image
CN116051736A (en) Three-dimensional reconstruction method, device, edge equipment and storage medium
CN111160255B (en) Fishing behavior identification method and system based on three-dimensional convolution network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant