CN113129319A - Image processing method, image processing device, computer equipment and storage medium - Google Patents

Image processing method, image processing device, computer equipment and storage medium Download PDF

Info

Publication number
CN113129319A
CN113129319A CN202110473429.2A CN202110473429A CN113129319A CN 113129319 A CN113129319 A CN 113129319A CN 202110473429 A CN202110473429 A CN 202110473429A CN 113129319 A CN113129319 A CN 113129319A
Authority
CN
China
Prior art keywords
image
feature
processed
skin
dimension
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110473429.2A
Other languages
Chinese (zh)
Other versions
CN113129319B (en
Inventor
吴尧
四建楼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sensetime Technology Development Co Ltd
Original Assignee
Beijing Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sensetime Technology Development Co Ltd filed Critical Beijing Sensetime Technology Development Co Ltd
Priority to CN202110473429.2A priority Critical patent/CN113129319B/en
Publication of CN113129319A publication Critical patent/CN113129319A/en
Application granted granted Critical
Publication of CN113129319B publication Critical patent/CN113129319B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/77Retouching; Inpainting; Scratch removal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The present disclosure provides an image processing method, apparatus, computer device, and storage medium, wherein the method comprises: acquiring an image to be processed; carrying out segmentation processing on a skin area on an image to be processed by utilizing a pre-trained deep neural network to obtain a skin segmentation image corresponding to the image to be processed, wherein the skin segmentation image is marked with a skin area and a non-skin area; and performing special effect processing on a skin area and/or a non-skin area in the image to be processed based on the skin segmentation image and the image to be processed. The skin region segmentation method and the device can accurately determine the skin segmentation result and improve the skin segmentation precision by carrying out the skin region segmentation on the image to be processed based on the trained deep neural network.

Description

Image processing method, image processing device, computer equipment and storage medium
Technical Field
The present disclosure relates to the field of image processing technologies, and in particular, to an image processing method and apparatus, a computer device, and a storage medium.
Background
The processing of images is more and more diversified, and in many scenes, a user needs to perform special effect processing such as beautifying on the face or the hand and the like of the user in the image, so that the skin state is changed, such as skin polishing, skin whitening and the like. These processes all require that pixel points belonging to the skin in the image be known in advance, that is, the image needs to be subjected to skin segmentation.
Most of the existing skin segmentation processes the skin region of an image to be processed based on a color space mode, and the mode is easily influenced by the background color of the image to be processed or the illumination intensity when the image is shot, so that the precision of the skin segmentation result is reduced. The reduction in the accuracy of skin segmentation affects the effect of image special effect processing.
Disclosure of Invention
The embodiment of the disclosure at least provides an image processing method, an image processing device, a computer device and a storage medium, so as to improve the accuracy of skin segmentation.
In a first aspect, an embodiment of the present disclosure provides an image processing method, including:
acquiring an image to be processed;
carrying out segmentation processing on a skin area on the image to be processed by utilizing a pre-trained deep neural network to obtain a skin segmentation image corresponding to the image to be processed, wherein the skin segmentation image is marked with a skin area and a non-skin area;
and performing special effect processing on a skin area and/or a non-skin area in the image to be processed based on the skin segmentation image and the image to be processed.
The method can determine image characteristic information reflecting the depth characteristics of the image to be processed by using the trained deep neural network, and can accurately determine whether each pixel point in the image to be processed belongs to the skin or not based on the depth characteristics of the image to be processed. Furthermore, the special effect processing is carried out on the image to be processed by using the accurate skin segmentation result, so that the effect of the special effect processing can be improved.
In a possible implementation manner, the performing, by using a deep neural network trained in advance, segmentation processing on a skin region of the image to be processed to obtain a skin segmentation image corresponding to the image to be processed includes:
extracting image characteristic information and structural information of the image to be processed by utilizing the pre-trained deep neural network;
and performing segmentation processing on the skin area of the image to be processed based on the extracted image characteristic information and the extracted structural information to obtain a skin segmentation image corresponding to the image to be processed.
The image feature information can reflect the depth features of the image to be processed, the structured information can reflect the position relationship among the feature points in the image feature information, and based on the position relationship among the feature points and the depth features corresponding to the feature points, whether each pixel point belongs to the skin or not can be accurately determined, namely, the skin segmentation result can be accurately determined, and an accurate skin segmentation image can be obtained.
In a possible implementation manner, the performing, on the basis of the extracted image feature information and the extracted structural information, segmentation processing on a skin region on the image to be processed to obtain a skin segmentation image corresponding to the image to be processed includes:
extracting image characteristic information corresponding to different characteristic dimensions in the image to be processed respectively by utilizing the pre-trained deep neural network; wherein the plurality of feature dimensions include a first feature dimension and a second feature dimension that are adjacent, the first feature dimension being lower than the second feature dimension; the image feature information corresponding to the first feature dimension is determined based on the image feature information corresponding to the second feature dimension and the structural information of the image feature information corresponding to the second feature dimension; the structural information of the image characteristic information corresponding to the second characteristic dimension is extracted by using the pre-trained deep neural network;
and carrying out segmentation processing on the skin area of the image to be processed based on the image characteristic information respectively corresponding to different characteristic dimensions to obtain a skin segmentation image corresponding to the image to be processed.
The low characteristic dimension can reflect the depth characteristic of the main body part of the image to be processed, and the high characteristic dimension can reflect the depth characteristic of the edge part of the image to be processed, so that the integral depth characteristic of the image to be processed can be completely and accurately reflected by utilizing the image characteristic information of different characteristic dimensions, and therefore, the skin segmentation is carried out based on the image characteristic information of different characteristic dimensions, and the skin segmentation precision is improved.
In a possible implementation manner, the structured information of the image feature information corresponding to the second feature dimension includes a positional relationship between first feature points in the image feature information corresponding to the second feature dimension;
the method further comprises the step of determining image feature information corresponding to the first feature dimension:
for each second feature point in the first feature dimension, based on the position information of the second feature point, screening a first target feature point corresponding to the second feature point from the first feature points corresponding to the second feature dimension;
determining, based on the first feature dimension and the second feature dimension, a target number of second feature points in the first feature dimension that correspond to first feature points in the second feature dimension;
screening second target feature points of the target number from the first feature points corresponding to the second feature dimensions based on the position relation among the first feature points and the position information of the first target feature points;
and determining the image feature information of the second feature point based on the image feature information of the second target feature point, and determining the image feature information corresponding to the first feature dimension based on the determined image feature information of each second feature point in the first feature dimension.
By using the structured information in the image feature information of the second feature dimension, the second target feature point corresponding to each second feature point in the image feature information of the first feature dimension can be accurately determined, and the depth feature of each second feature point in the image feature information of the first feature dimension can be accurately determined by combining the depth features of each first feature point in the second target feature points, so that one second feature point in the image feature information of the first feature dimension can reflect the depth features of a plurality of second feature points in the image feature information of adjacent feature dimensions.
In a possible implementation manner, the segmenting process of the skin region on the image to be processed based on the image feature information respectively corresponding to different feature dimensions includes:
for each feature dimension in the different feature dimensions, determining a first semantic prediction result of the image to be processed under the feature dimension based on image feature information corresponding to the feature dimension;
determining the probability that each pixel point in the image to be processed is a pixel point corresponding to skin based on a first semantic prediction result of the image to be processed under each feature dimension;
and carrying out segmentation processing on the skin area of the image to be processed based on the probability that each pixel point in the image to be processed is a pixel point corresponding to the skin and a preset segmentation probability value.
The first semantic prediction result is used for representing the probability that each pixel point is a pixel point corresponding to the skin, the pixel points with lower probability can be screened out by utilizing the preset segmentation probability value, the pixel points with higher probability are reserved, and the skin segmentation is carried out by utilizing the probability of multiple dimensions corresponding to the pixel points and the preset segmentation probability value, so that the improvement of the skin segmentation precision is facilitated.
In a possible implementation manner, the determining, based on a first semantic prediction result of the to-be-processed image in each feature dimension, a probability that each pixel point in the to-be-processed image is a pixel point corresponding to a skin includes:
performing multiple times of fusion processing according to the sequence of the different feature dimensions from low to high to obtain the probability that each pixel point in the image to be processed is a pixel point corresponding to the skin;
wherein, the ith fusion processing in the multiple fusion processing comprises the following steps:
determining confidence information of a first semantic prediction result under the first feature dimension;
fusing the first semantic prediction result under the first characteristic dimension and the first semantic prediction result under the second characteristic dimension by using the confidence information of the first semantic prediction result under the first characteristic dimension to obtain a target semantic prediction result under the second characteristic dimension;
and updating the target semantic prediction result into a first semantic prediction result of a first feature dimension in the (i + 1) th fusion process.
The confidence information can reflect the accuracy of the first semantic prediction result, multiple times of fusion processing are performed according to the sequence of different feature dimensions from low to high, and finally a target semantic prediction result fused with each first semantic prediction result is obtained, so that the deep neural network can generate different attention to the first semantic prediction results of the multiple feature dimensions, and the accuracy of the deep neural network is improved.
In a possible embodiment, the determining, for each of the different feature dimensions, a first semantic prediction result of the image to be processed in the feature dimension based on image feature information corresponding to the feature dimension includes:
aiming at the lowest feature dimension, determining a first semantic prediction result of the image to be processed under the lowest feature dimension based on image feature information corresponding to the lowest feature dimension;
and for each second feature dimension except the lowest feature dimension, determining a first semantic prediction result of the image to be processed under the second feature dimension based on the image feature information corresponding to the second feature dimension and the first semantic prediction result of the image to be processed under the first feature dimension.
And determining a first semantic prediction result of the current feature dimension according to the first semantic prediction result of the previous feature dimension and the image feature information corresponding to the current feature dimension, so that the first semantic prediction result carries the features of each feature dimension, and further the accuracy of the neural network is improved.
In one possible embodiment, the method further comprises the step of training the deep neural network:
obtaining a sample image, the sample image containing a skin region;
inputting the sample image into a deep neural network to be trained, and determining a prediction segmentation image of the sample image;
and generating a target loss based on the sample segmentation image corresponding to the sample image and the prediction segmentation image, and training the deep neural network to be trained by using the target loss to obtain the trained deep neural network, wherein the sample segmentation image is marked with skin identification information.
The target loss is determined based on the sample segmentation image and the prediction segmentation image, the deep neural network to be trained is trained by utilizing the determined target loss, and the deep neural network obtained through training can be ensured to output an accurate prediction segmentation image, so that accurate segmentation of the skin area is realized.
In one possible embodiment, the acquiring the sample image includes:
acquiring a first sample image having different skin characteristics;
performing an image brightness changing operation on at least part of the first sample images to obtain second sample images;
taking the first sample image and the second sample image as the sample images.
The sample images with different skin characteristics are used for training the deep neural network to be trained, so that the skin segmentation precision of the trained deep neural network when the skin segmentation is performed on the images to be processed with different skin characteristics can be improved. In addition, the second sample image with the changed image brightness is used for training the deep neural network to be trained, so that the adaptability of the trained deep neural network to the image brightness change can be improved, and the skin segmentation precision is further improved.
In one possible embodiment, the acquiring the sample image includes:
acquiring a third sample image under different illumination intensities;
taking the first sample image, the second sample image, and the third sample image as the sample images.
The obtained third sample images under different illumination intensities are used for training the deep neural network to be trained, so that the number of the sample images used for training can be increased, the adaptability of the trained deep neural network to illumination changes can be improved, and the skin segmentation precision is improved.
In a second aspect, an embodiment of the present disclosure further provides an image processing apparatus, including:
the acquisition module is used for acquiring an image to be processed;
the segmentation module is used for carrying out segmentation processing on a skin area on the image to be processed by utilizing a pre-trained deep neural network to obtain a skin segmentation image corresponding to the image to be processed, wherein the skin segmentation image is marked with a skin area and a non-skin area;
and the processing module is used for carrying out special effect processing on a skin area and/or a non-skin area in the image to be processed based on the skin segmentation image and the image to be processed.
In a possible implementation manner, the segmentation module is configured to extract image feature information and structural information of the image to be processed by using the deep neural network trained in advance;
and performing segmentation processing on the skin area of the image to be processed based on the extracted image characteristic information and the extracted structural information to obtain a skin segmentation image corresponding to the image to be processed.
In a possible implementation manner, the segmentation module is configured to extract, by using the deep neural network trained in advance, image feature information corresponding to different feature dimensions in the image to be processed; wherein the plurality of feature dimensions include a first feature dimension and a second feature dimension that are adjacent, the first feature dimension being lower than the second feature dimension; the image feature information corresponding to the first feature dimension is determined based on the image feature information corresponding to the second feature dimension and the structural information of the image feature information corresponding to the second feature dimension; the structural information of the image characteristic information corresponding to the second characteristic dimension is extracted by using the pre-trained deep neural network;
and carrying out segmentation processing on the skin area of the image to be processed based on the image characteristic information respectively corresponding to different characteristic dimensions to obtain a skin segmentation image corresponding to the image to be processed.
In a possible implementation manner, the structured information of the image feature information corresponding to the second feature dimension includes a positional relationship between first feature points in the image feature information corresponding to the second feature dimension;
the segmentation module is further configured to determine image feature information corresponding to the first feature dimension according to the following steps:
for each second feature point in the first feature dimension, based on the position information of the second feature point, screening a first target feature point corresponding to the second feature point from the first feature points corresponding to the second feature dimension;
determining, based on the first feature dimension and the second feature dimension, a target number of second feature points in the first feature dimension that correspond to first feature points in the second feature dimension;
screening second target feature points of the target number from the first feature points corresponding to the second feature dimensions based on the position relation among the first feature points and the position information of the first target feature points;
and determining the image feature information of the second feature point based on the image feature information of the second target feature point, and determining the image feature information corresponding to the first feature dimension based on the determined image feature information of each second feature point in the first feature dimension.
In a possible implementation manner, the segmentation module is configured to determine, for each of the different feature dimensions, a first semantic prediction result of the image to be processed in the feature dimension based on image feature information corresponding to the feature dimension;
determining the probability that each pixel point in the image to be processed is a pixel point corresponding to skin based on a first semantic prediction result of the image to be processed under each feature dimension;
and carrying out segmentation processing on the skin area of the image to be processed based on the probability that each pixel point in the image to be processed is a pixel point corresponding to the skin and a preset segmentation probability value.
In a possible implementation manner, the segmentation module is configured to perform multiple times of fusion processing according to the sequence from low to high of the different feature dimensions, and then obtain a probability that each pixel point in the image to be processed is a pixel point corresponding to the skin;
wherein, the ith fusion processing in the multiple fusion processing comprises the following steps:
determining confidence information of a first semantic prediction result under the first feature dimension;
fusing the first semantic prediction result under the first characteristic dimension and the first semantic prediction result under the second characteristic dimension by using the confidence information of the first semantic prediction result under the first characteristic dimension to obtain a target semantic prediction result under the second characteristic dimension;
and updating the target semantic prediction result into a first semantic prediction result of a first feature dimension in the (i + 1) th fusion process.
In a possible implementation manner, the segmentation module is configured to determine, for a lowest feature dimension, a first semantic prediction result of the image to be processed in the lowest feature dimension based on image feature information corresponding to the lowest feature dimension;
and for each second feature dimension except the lowest feature dimension, determining a first semantic prediction result of the image to be processed under the second feature dimension based on the image feature information corresponding to the second feature dimension and the first semantic prediction result of the image to be processed under the first feature dimension.
In a possible embodiment, the apparatus further comprises:
a training module for training the deep neural network according to the following steps:
obtaining a sample image, the sample image containing a skin region; (ii) a
Inputting the sample image into a deep neural network to be trained, and determining a prediction segmentation image of the sample image;
and generating a target loss based on the sample segmentation image corresponding to the sample image and the prediction segmentation image, and training the deep neural network to be trained by using the target loss to obtain the trained deep neural network, wherein the sample segmentation image is marked with skin identification information.
In one possible embodiment, the training module is configured to acquire a first sample image having different skin features;
performing an image brightness changing operation on at least part of the first sample images to obtain second sample images;
taking the first sample image and the second sample image as the sample images.
In a possible implementation manner, the training module is used for acquiring a third sample image under different illumination intensities;
taking the first sample image, the second sample image, and the third sample image as the sample images.
In a third aspect, this disclosure also provides a computer device, a processor, and a memory, where the memory stores machine-readable instructions executable by the processor, and the processor is configured to execute the machine-readable instructions stored in the memory, and when the machine-readable instructions are executed by the processor, the machine-readable instructions are executed by the processor to perform the steps in the first aspect or any one of the possible implementations of the first aspect.
In a fourth aspect, this disclosure also provides a computer-readable storage medium having a computer program stored thereon, where the computer program is executed to perform the steps in the first aspect or any one of the possible implementation manners of the first aspect.
For the description of the effects of the image processing apparatus, the computer device, and the computer-readable storage medium, reference is made to the description of the image processing method, which is not repeated here.
The image processing method, the image processing device, the computer equipment and the storage medium provided by the embodiment of the disclosure can determine the image characteristic information reflecting the depth characteristic of the image to be processed by using the trained deep neural network, and can accurately determine whether each pixel point in the image to be processed belongs to the skin or not based on the depth characteristic of the image to be processed. Furthermore, the special effect processing is carried out on the image to be processed by using the accurate skin segmentation result, so that the effect of the special effect processing can be improved.
In order to make the aforementioned objects, features and advantages of the present disclosure more comprehensible, preferred embodiments accompanied with figures are described in detail below.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present disclosure, the drawings required for use in the embodiments will be briefly described below, and the drawings herein incorporated in and forming a part of the specification illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the technical solutions of the present disclosure. It is appreciated that the following drawings depict only certain embodiments of the disclosure and are therefore not to be considered limiting of its scope, for those skilled in the art will be able to derive additional related drawings therefrom without the benefit of the inventive faculty.
Fig. 1 shows a flowchart of an image processing method provided by an embodiment of the present disclosure;
fig. 2 is a schematic diagram of a skin segmentation image provided by an embodiment of the present disclosure;
FIG. 3 illustrates a flowchart of a skin segmentation process for processing an image provided by an embodiment of the present disclosure;
fig. 4 is a flowchart illustrating a method for determining image feature information corresponding to a first feature dimension according to an embodiment of the present disclosure;
fig. 5 shows a flowchart of a method for performing segmentation processing on a skin region of an image to be processed according to an embodiment of the present disclosure;
FIG. 6 is a flow chart illustrating a fusion process provided by an embodiment of the present disclosure;
FIG. 7 is a schematic diagram illustrating a deep neural network segmentation process of a skin region of an image to be processed according to an embodiment of the present disclosure;
FIG. 8 illustrates a fusion diagram of a fusion construct provided by embodiments of the present disclosure;
FIG. 9 illustrates a flow chart of a method of training a deep neural network provided by an embodiment of the present disclosure;
fig. 10 shows a schematic diagram of an image processing apparatus provided by an embodiment of the present disclosure;
fig. 11 shows a schematic structural diagram of a computer device provided by an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present disclosure more clear, the technical solutions of the embodiments of the present disclosure will be described clearly and completely with reference to the drawings in the embodiments of the present disclosure, and it is obvious that the described embodiments are only a part of the embodiments of the present disclosure, not all of the embodiments. The components of embodiments of the present disclosure, as generally described and illustrated herein, may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present disclosure is not intended to limit the scope of the disclosure, as claimed, but is merely representative of selected embodiments of the disclosure. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the disclosure without making creative efforts, shall fall within the protection scope of the disclosure.
Furthermore, the terms "first," "second," and the like in the description and in the claims, and in the drawings described above, in the embodiments of the present disclosure are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein.
Reference herein to "a plurality or a number" means two or more. "and/or" describes the association relationship of the associated objects, meaning that there may be three relationships, e.g., a and/or B, which may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
Research shows that the processing of images is more and more diversified, and in many scenes, a user needs to perform special effect processing such as beautifying on the face or the hand and other areas in the images, so as to change the skin state, such as skin grinding, whitening and the like. These processes all require that pixel points belonging to the skin in the image be known in advance, that is, the image needs to be subjected to skin segmentation. Most of the existing skin segmentation processes the skin region of an image to be processed based on a color space mode, and the mode is easily influenced by the background color of the image to be processed or the illumination intensity when the image is shot, so that the precision of the skin segmentation result is reduced. The reduction in the accuracy of skin segmentation affects the effect of image special effect processing.
Based on the above research, the present disclosure provides an image processing method, an image processing apparatus, a computer device, and a storage medium, which can determine image feature information reflecting a depth feature of an image to be processed by using a trained deep neural network, and can accurately determine whether each pixel point in the image to be processed belongs to a skin based on the depth feature of the image to be processed. Furthermore, the special effect processing is carried out on the image to be processed by using the accurate skin segmentation result, so that the effect of the special effect processing can be improved.
The above-mentioned drawbacks are the results of the inventor after practical and careful study, and therefore, the discovery process of the above-mentioned problems and the solutions proposed by the present disclosure to the above-mentioned problems should be the contribution of the inventor in the process of the present disclosure.
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures.
To facilitate understanding of the present embodiment, first, an image processing method disclosed in the embodiments of the present disclosure is described in detail, where an execution subject of the image processing method provided in the embodiments of the present disclosure is generally a computer device with certain computing capability, and in some possible implementations, the image processing method may be implemented by a processor calling a computer readable instruction stored in a memory.
The following describes an image processing method provided by an embodiment of the present disclosure, taking an execution subject as a computer device as an example.
As shown in fig. 1, a flowchart of an image processing method provided in an embodiment of the present disclosure may include the following steps:
s101: and acquiring an image to be processed.
Here, at least a part of the skin area of the at least one target object may be included in the image to be processed, at least a part of the skin area may have different skin characteristics, and the skin characteristics may include skin color, skin smoothness, and the like. For example, skin color may include yellow skin, white skin, black skin, and the like.
S102: and performing segmentation processing on the skin area of the image to be processed by using the pre-trained deep neural network to obtain a skin segmentation image corresponding to the image to be processed.
Wherein the skin segmentation image identifies skin regions and non-skin regions.
After the image to be processed is obtained, the image to be processed can be input to a pre-trained deep neural network, and the pre-trained deep neural network can determine semantic features of each pixel point in the image to be processed based on the extracted image feature information capable of reflecting the depth features of the image to be processed. Then, based on the semantic features of the pixel points, the pixel points belonging to the skin and the pixel points not belonging to the skin in the image to be processed are determined, and then the segmentation processing of the skin area of the image to be processed is completed, so that a skin segmentation image corresponding to the image to be processed is obtained.
The skin segmentation image may be a mask image indicating the positions of pixel points belonging to the skin in the image to be processed. Fig. 2 is a schematic diagram of a skin segmentation image according to an embodiment of the present disclosure.
And, the resulting skin segmentation image may be an image of the same size and image resolution as the image to be processed.
In addition, the pre-trained deep neural network mentioned in the embodiment of the present disclosure can also be adjusted according to the processing capability of the computer device to which the deep neural network is applied, and is compatible with various computer devices on the basis of not affecting the skin segmentation precision.
The pre-trained deep neural network may be an mfnv2 neural network.
S103: and performing special effect processing on a skin area and/or a non-skin area in the image to be processed based on the skin segmentation image and the image to be processed.
In this step, after the skin segmentation image is obtained, a request for performing special effect processing on a skin region and/or a non-skin region in the image to be processed may be received, and then an image range in the image to be processed, which needs to be subjected to special effect processing, may be determined based on the positions of each pixel point belonging to the skin and indicated in the skin segmentation image and the positions of each pixel point in the image to be processed, so as to complete special effect processing on the skin region and/or the non-skin region in the image to be processed.
The special effect treatment for the skin area may include, for example, a skin color changing treatment, a whitening treatment, a skin filter, a beautifying treatment, and the like, which is not limited herein. The special effect processing for the non-skin area may include, for example, brightness adjustment, background blurring, and the like, which is not limited herein.
Therefore, the trained deep neural network can be used for determining the image characteristic information reflecting the depth characteristics of the image to be processed, whether each pixel point in the image to be processed belongs to the skin can be accurately determined based on the depth characteristics of the image to be processed, and the scheme for skin segmentation by using the deep learning mode can overcome the interference of the illumination intensity and the background color on the skin segmentation in the prior art and improve the anti-interference performance and the accuracy of the skin segmentation. Furthermore, the special effect processing is carried out on the image to be processed by using the accurate skin segmentation result, so that the effect of the special effect processing can be improved.
In an embodiment, regarding S102, the segmentation process of the skin region may be performed on the processed image according to a method as shown in fig. 3, and as shown in fig. 3, a flowchart of performing the skin segmentation process on the processed image provided by the embodiment of the present disclosure may include the following steps:
s301: and extracting image characteristic information and structural information of the image to be processed by utilizing a pre-trained deep neural network.
Here, the image feature information may include color feature information, feature points, structured information, and the like, where the feature points may be determined according to pixel points in the image to be processed, and have a corresponding relationship with the pixel points in the image to be processed. The color feature information can reflect the color of the feature point. The structured information is information for characterizing the positional relationship of feature points in the image feature information.
In specific implementation, after the image to be processed is input to the pre-trained deep neural network, the deep neural network can extract image feature information of the image to be processed, that is, structural information of the image to be processed.
S302: and performing segmentation processing on the skin area of the image to be processed based on the extracted image characteristic information and the extracted structural information to obtain a skin segmentation image corresponding to the image to be processed.
In this step, it can be determined whether the feature point belongs to the skin region according to the color feature information and the structural information of the feature point in the image feature information. Therefore, the skin area and the non-skin area of the image to be processed can be determined, the segmentation processing of the image to be processed is completed, and the skin segmentation image is obtained.
In one embodiment, for S302, it may be determined that the segmentation process of the image to be processed is completed according to the following steps:
firstly, extracting image characteristic information corresponding to different characteristic dimensions in an image to be processed by utilizing a pre-trained deep neural network.
The plurality of feature dimensions comprise a first feature dimension and a second feature dimension which are adjacent to each other, and the first feature dimension is lower than the second feature dimension.
The image feature information corresponding to the first feature dimension is determined based on the image feature information corresponding to the second feature dimension and the structural information of the image feature information corresponding to the second feature dimension; the structural information of the image feature information corresponding to the second feature dimension is obtained by extracting the structural information by using a pre-trained deep neural network.
In this step, the pre-trained deep neural network includes feature extractors respectively corresponding to a plurality of feature dimensions; each feature extractor may extract image feature information in its corresponding feature dimension. Based on a plurality of feature extractors, image feature information respectively corresponding to different feature dimensions in the image to be processed can be respectively extracted. For example, a pre-trained deep neural network may include 4 feature extractors, which can extract image feature information in 4 feature dimensions.
The characteristic dimension may be an image resolution, and the image to be processed has an initial image resolution.
In specific implementation, firstly, image characteristic information and structural information corresponding to the initial image resolution of the image to be processed can be extracted by using a pre-trained deep neural network. Then, the initial image resolution is used as a second feature dimension, and image feature information corresponding to the first feature dimension is determined based on the image feature information and the structural information corresponding to the second feature dimension. Moreover, while the image feature information corresponding to the first feature dimension is determined, the structured information of the image feature information can also be determined.
Then, the first feature dimension may be used as a new second feature dimension, and image feature information and structured information corresponding to a next first feature dimension lower than the new second feature dimension may be determined. Therefore, the image feature information respectively corresponding to different feature dimensions in the image to be processed and the structural information in the image feature information can be respectively extracted. And the characteristic dimension corresponding to the initial image resolution is the highest characteristic dimension.
And secondly, performing segmentation processing on skin areas of the image to be processed based on the image characteristic information respectively corresponding to different characteristic dimensions to obtain a skin segmentation image corresponding to the image to be processed.
The image feature information under the high feature dimension can reflect the depth feature of the edge part of the image to be processed, and the image feature information under the low feature dimension can reflect the depth feature of the main part of the image to be processed.
In specific implementation, based on the image feature information corresponding to different feature dimensions, a main area and an edge area belonging to a skin area in an image to be processed can be determined, and then the skin area in the image to be processed can be determined, that is, whether each pixel point in the image to be processed belongs to a pixel point corresponding to the skin can be determined.
Further, based on the determined skin area, the segmentation processing of the image to be processed can be completed, and a skin segmentation image corresponding to the image to be processed is obtained.
In one embodiment, the structured information of the image feature information corresponding to the second feature dimension includes a positional relationship between the first feature points in the image feature information corresponding to the second feature dimension. Regarding the step of determining the image feature information corresponding to the first feature dimension, a method shown in fig. 4 may be adopted, and as shown in fig. 4, a flowchart of a method for determining the image feature information corresponding to the first feature dimension provided by the embodiment of the present disclosure may include the following steps:
s401: and for each second feature point in the first feature dimension, screening a first target feature point corresponding to the second feature point from the first feature points corresponding to the second feature dimension based on the position information of the second feature point.
Here, the image feature information corresponding to each feature dimension includes different numbers of feature points, and the number of second feature points in the image feature information corresponding to the first feature dimension is smaller than the number of first feature points in the image feature information corresponding to the second feature dimension. That is, the number of feature points corresponding to the low image resolution is smaller than the number of feature points corresponding to the high image resolution.
And each second feature point corresponding to the first feature dimension has a first feature point corresponding to the second feature point in the second feature dimension.
In this step, for each second feature point in the first feature dimension, the position information of the second feature point may be determined, and for each first feature point in the second feature dimension, the position information of the first feature point may also be determined. Then, based on the position information of each second feature point in the first feature dimension and the position information of each first feature point in the second feature dimension, a first feature point and a second feature point with the same position can be determined in the first feature dimension and the second feature dimension respectively, and the second feature point is used as a first target feature point corresponding to the first feature point which is screened out. That is, the first target feature point corresponding to each second feature point in the first feature dimension may be determined from the first feature points corresponding to the second feature dimensions.
S402: based on the first feature dimension and the second feature dimension, a target number of second feature points in the first feature dimension corresponding to the first feature points in the second feature dimension is determined.
Here, the image feature information of one second feature point in the first feature dimension may be determined from the image feature information of a plurality of first feature points in the second feature dimension.
In specific implementation, the target number of the second feature point in the first feature dimension corresponding to the first feature point in the second feature dimension may be determined based on a conversion relationship between the first feature dimension and the second feature dimension. For example, a second feature point in one first feature dimension may correspond to a second feature point in 10 second feature dimensions.
S403: and screening second target characteristic points of the target quantity from the first characteristic points corresponding to the second characteristic dimensions based on the position relation among the first characteristic points and the position information of the first target characteristic points.
In this step, based on the structured information corresponding to the second feature dimension, a position relationship between first feature points in the second feature dimension may be determined, and then, for each determined first target feature point, a target number of first feature points may be selected and screened from the first feature points corresponding to the second feature dimension as second target feature points according to the position relationship between the position information of the first target feature point and the first feature points.
In specific implementation, the first feature points with preset distance from the first target feature point and target quantity are screened from the first feature points corresponding to the second feature dimension according to the position information of the first target feature point and the position relationship between the first feature points, and the first feature points are used as second target feature points.
S404: and determining image feature information of the second feature points based on the image feature information of the second target feature points, and determining image feature information corresponding to the first feature dimension based on the image feature information of each second feature point in the determined first feature dimension.
Here, from the image feature information of each of the determined target number of second target feature points, the image feature information of the second feature point in the first feature dimension corresponding to the second target feature point may be determined.
Further, based on the above steps, image feature information of each second feature point in the first feature dimension may be determined, and based on the image feature information of each second feature point, image feature information corresponding to the first feature dimension may be determined.
Thus, the image feature information respectively corresponding to different feature dimensions in the image to be processed can be respectively extracted.
In specific implementation, for the image feature information corresponding to the second feature dimension, the feature extractor corresponding to the first feature dimension may perform downsampling on the image feature information corresponding to the second feature dimension in a downsampling manner to determine the image feature information corresponding to the first feature dimension.
Further, after obtaining image feature information corresponding to different feature dimensions, the skin region segmentation processing may be performed on the image to be processed according to the method shown in fig. 5, and as shown in fig. 5, a flowchart of a method for performing skin region segmentation processing on the image to be processed according to an embodiment of the present disclosure may include the following steps:
s501: and determining a first semantic prediction result of the image to be processed under the characteristic dimension based on the image characteristic information corresponding to the characteristic dimension for each characteristic dimension in different characteristic dimensions.
Here, the first semantic prediction result is used to represent the probability that a pixel point in the image to be processed is a pixel point corresponding to the skin.
In specific implementation, for each feature dimension, after the feature extractor corresponding to the feature dimension in the deep neural network determines the image feature information corresponding to the feature dimension, a first semantic prediction result of the image to be processed in the feature dimension may be determined according to the image feature information corresponding to the feature dimension.
Furthermore, based on each feature extractor in the deep neural network, a first semantic prediction result of the image to be processed under different feature dimensions can be determined.
In one embodiment, for S501, a first semantic prediction result of the image to be processed in each feature dimension may be determined according to the following steps:
the method comprises the steps of firstly, aiming at the lowest feature dimension, and determining a first semantic prediction result of an image to be processed under the lowest feature dimension based on image feature information corresponding to the lowest feature dimension.
After the image feature information corresponding to each feature dimension is obtained, for the lowest feature dimension, the classifier corresponding to the lowest feature dimension may output a first semantic prediction result of the image to be processed in the lowest feature dimension according to the image feature information corresponding to the lowest feature dimension.
And step two, aiming at each second feature dimension except the lowest feature dimension, determining a first semantic prediction result of the image to be processed under the second feature dimension based on the image feature information corresponding to the second feature dimension and the first semantic prediction result of the image to be processed under the first feature dimension.
Here, since the first feature dimension is lower than the second feature dimension, the lowest feature dimension must be one first feature dimension. After the classifier corresponding to the lowest feature dimension determines the first semantic prediction result in the lowest feature dimension, the classifier corresponding to the second feature dimension corresponding to the lowest feature dimension may determine the first semantic prediction result in the second feature dimension of the image to be processed based on the first semantic prediction result in the lowest feature dimension and the image feature information in the second feature dimension. Furthermore, the classifier corresponding to each second feature dimension may determine the first semantic prediction result in the second feature dimension based on the first semantic prediction result in the first feature dimension and the image feature information in the second feature dimension.
In specific implementation, the classifiers corresponding to different feature dimensions may perform upsampling on the first semantic prediction result corresponding to the low feature dimension and the image feature information of the second feature dimension in an upsampling manner, and determine the first semantic prediction result of the feature dimension.
S502: and determining the probability that each pixel point in the image to be processed is a pixel point corresponding to the skin based on the first semantic prediction result of the image to be processed under each feature dimension.
After the first semantic prediction result under each feature dimension is obtained, multiple times of fusion processing can be performed according to the sequence of different feature dimensions from low to high, and then the probability that each pixel point in the image to be processed is the pixel point corresponding to the skin can be obtained.
S503: and carrying out segmentation processing on the skin area of the image to be processed based on the probability that each pixel point in the image to be processed is the pixel point corresponding to the skin and the preset segmentation probability value.
In specific implementation, the probability that each pixel point in the image to be processed is a pixel point corresponding to the skin can be compared with a preset segmentation probability value, the pixel point is taken as a pixel point corresponding to the skin under the condition that the probability is determined to be greater than the preset segmentation probability value, and the pixel point is determined not to be a pixel point corresponding to the skin under the condition that the probability is determined not to be greater than the preset segmentation probability value.
Furthermore, the pixel points belonging to the skin region and the pixel points belonging to the non-skin region in the image to be processed can be determined, and based on the determined result, the segmentation processing of the image to be processed is completed, so that a skin segmentation image is obtained.
In an embodiment, for the ith fusion process in the multiple fusion processes, the fusion process may be performed according to the fusion process procedure shown in fig. 6, and as shown in fig. 6, a flowchart of a fusion process procedure provided for an embodiment of the present disclosure may include the following steps:
s601: confidence information of the first semantic prediction result in the first feature dimension is determined.
Here, the first feature dimension is the lowest feature dimension.
In specific implementation, a second feature dimension corresponding to the first feature dimension may be determined, and a fusion structure corresponding to the second feature dimension may be determined. Then, the fusion structure corresponding to the second feature dimension can determine the confidence of the first semantic prediction result under the first feature dimension according to the activation function formula therein, so as to obtain confidence information.
Wherein, the activation function formula can be a formula of a softmax function.
S602: and fusing the first semantic prediction result under the first characteristic dimension and the first semantic prediction result under the second characteristic dimension by using the confidence information of the first semantic prediction result under the first characteristic dimension to obtain a target semantic prediction result under the second characteristic dimension.
In this step, after obtaining the confidence information of the first semantic prediction result in the first feature dimension, the fusion structure may fuse the first semantic prediction result in the first feature dimension and the first semantic prediction result in the second feature dimension based on the confidence information to obtain the target semantic prediction result in the second feature dimension.
In specific implementation, the confidence of the first semantic prediction result of each pixel point is determined according to the confidence information of the first semantic prediction result of each pixel point, the confidence of the first semantic prediction result of each pixel point is compared with a preset confidence threshold, and the first semantic prediction result of the pixel point corresponding to the confidence is used as the first semantic prediction result in the second feature dimension under the condition that the confidence is determined to be not less than the preset confidence threshold. And under the condition that the confidence coefficient is smaller than a preset confidence coefficient threshold value, taking a first semantic prediction result of the pixel point corresponding to the confidence coefficient under the second characteristic dimension as a target semantic prediction result.
Then, the second feature dimension may be used as a new first feature dimension, and the target semantic prediction result in the second feature dimension may be updated to a new first feature dimension first semantic prediction result in the (i + 1) th fusion process.
Based on the steps, a target semantic prediction result under the highest feature dimension can be determined, wherein the target semantic prediction result is also used for representing the probability that each pixel point in the image to be processed is the pixel point corresponding to the skin.
Therefore, based on the target semantic prediction result under the highest characteristic dimension, the probability that each pixel point in the image to be processed is the pixel point corresponding to the skin can be determined.
As shown in fig. 7, a schematic diagram of performing segmentation processing on a skin region on an image to be processed by a deep neural network according to an embodiment of the present disclosure is provided, where a feature extractor a, a feature extractor b, a feature extractor c, and a feature extractor d are feature extractors corresponding to different feature dimensions, and can extract image feature information of the image to be processed corresponding to the different feature dimensions. The feature extractor a can extract image feature information corresponding to a feature dimension X1, the feature extractor b can extract image feature information corresponding to a feature dimension X2, the feature extractor c can extract image feature information corresponding to a feature dimension X3, and the feature extractor d can extract image feature information corresponding to a feature dimension X4, wherein X1 is higher than X2 and X3 is higher than X4. The classifier a, the classifier b, the classifier c and the classifier d are classifiers corresponding to different feature dimensions, the classifier d can determine a first semantic prediction result under the feature dimension X4 based on image feature information corresponding to the feature dimension X4, the classifier c can determine a first semantic prediction result under the feature dimension X3 based on the first semantic prediction result under the feature dimension X4 and image feature information corresponding to the feature dimension X3, the classifier b can determine a first semantic prediction result under the feature dimension X2 based on the first semantic prediction result under the feature dimension X3 and image feature information corresponding to the feature dimension X2, and the classifier a determines a first semantic prediction result under the feature dimension X1 based on the first semantic prediction result under the feature dimension X2 and image feature information corresponding to the feature dimension X1. The fusion structure c may determine a target semantic prediction result under the feature dimension X3 corresponding to the classifier c based on the first semantic prediction result output by the classifier d and the first semantic prediction result output by the classifier c, the fusion structure b may determine a target semantic prediction result under the feature dimension X2 corresponding to the classifier b based on the target semantic prediction result output by the fusion structure c and the first semantic prediction result output by the classifier b, the fusion structure a may determine a target semantic prediction result under the feature dimension X1 corresponding to the classifier a based on the target semantic prediction result output by the fusion structure b and the first semantic prediction result output by the classifier a, and then the deep neural network may output the skin segmentation image based on the target semantic prediction result under the feature dimension X1 corresponding to the classifier a.
As shown in fig. 8, which is a schematic fusion diagram of a fusion structure provided in the embodiment of the present disclosure, the fusion structure determines confidence information Low confidence of a first semantic prediction result Low prediction in a first feature dimension by using an activation function formula softmax, and then determines a target semantic prediction result Final prediction in a second feature dimension according to the first semantic prediction result High prediction in the second feature dimension, the Low confidence of the Low prediction, and the Low prediction of the Low prediction.
In another embodiment, in the image processing method provided in the embodiment of the present disclosure, after determining the probability that each pixel point in the image to be processed is a pixel point corresponding to the skin based on the target semantic prediction result in the highest feature dimension, the image to be processed may not be segmented by using the preset segmentation probability value. Based on the received special effect processing request, each pixel point can be processed to different degrees according to the probability corresponding to each pixel point.
Taking the received special effect processing request as an example of whitening the pixel points belonging to the skin area, according to the probability corresponding to each pixel point, primary whitening processing can be performed on the pixel points with the probability greater than 80%, secondary whitening processing can be performed on the pixel points with the probability in the range of 60% -80%, and tertiary whitening processing can be performed on the pixel points with the probability less than 60%, so that special effect processing is completed.
In an embodiment, since the deep neural network provided by the embodiment of the present disclosure is a deep neural network trained in advance, the embodiment of the present disclosure further provides a step of training the deep neural network, as shown in fig. 9, a flowchart of a method for training the deep neural network provided by the embodiment of the present disclosure may include the following steps:
s901: a sample image is acquired.
Wherein the sample image contains a skin region.
Here, at least a part of the skin area of at least one sample object may be included in the sample image, and different sample images may also have different image characteristics such as image background, image color, and the like.
S902: and inputting the sample image into a deep neural network to be trained, and determining a prediction segmentation image of the sample image.
Here, after the sample image is input to the deep neural network to be trained, the deep neural network to be trained may output a predicted segmentation image obtained after segmentation processing is performed on the sample image.
S903: and generating target loss based on the sample segmentation image and the prediction segmentation image corresponding to the sample image, and training the deep neural network to be trained by utilizing the target loss to obtain the trained deep neural network.
The sample segmentation image is marked with skin identification information, and a skin area and a non-skin area in the sample image can be identified.
Here, the sample segmentation image corresponding to the sample image may be obtained by performing segmentation processing on the sample image in advance according to the skin region in the sample image.
Then, according to the sample segmentation image and the prediction segmentation image, the target loss between the two images can be determined, then, the deep neural network to be trained can be trained by using the determined target loss, and under the condition that the training cutoff condition is determined to be reached, the training is completed, so that the trained deep neural network is obtained.
In this way, the trained deep neural network can output an accurate prediction segmentation image, and therefore accurate segmentation of the skin region is achieved.
In one possible implementation, for S901, a first sample image with different skin features may be acquired first. The skin characteristics may include, among others, skin color, skin smoothness, etc. For example, skin color may include yellow skin, white skin, black skin, and the like.
Then, an image brightness changing operation may be performed on at least a portion of the acquired first sample image, resulting in a second sample image. Further, a second sample image having a different image brightness may be obtained.
Finally, the first sample image and the second sample image may be taken as final sample images.
Therefore, the sample images with different skin characteristics are used for training the deep neural network to be trained, and the skin segmentation precision of the trained deep neural network in skin segmentation of the images to be processed with different skin characteristics can be improved. In addition, the second sample image with the changed image brightness is used for training the deep neural network to be trained, so that the adaptability of the trained deep neural network to the image brightness change can be improved, and the skin segmentation precision is further improved.
In one embodiment, for S901, after the first sample image and the second sample image are acquired, a third sample image under different illumination intensities may be acquired, and the first sample image, the second sample image and the third sample image are taken together as a sample image.
Therefore, the obtained third sample images under different illumination intensities, the first sample image and the second sample image are used for training the deep neural network to be trained together, the number of the sample images used for training can be increased, the adaptability of the trained deep neural network to illumination changes can be improved, and therefore the skin segmentation precision is improved.
It will be understood by those skilled in the art that in the method of the present invention, the order of writing the steps does not imply a strict order of execution and any limitations on the implementation, and the specific order of execution of the steps should be determined by their function and possible inherent logic.
Based on the same inventive concept, an image processing apparatus corresponding to the image processing method is also provided in the embodiments of the present disclosure, and since the principle of the apparatus in the embodiments of the present disclosure for solving the problem is similar to the image processing method described above in the embodiments of the present disclosure, the implementation of the apparatus may refer to the implementation of the method, and repeated details are not described again.
As shown in fig. 10, a schematic diagram of an image processing apparatus provided in an embodiment of the present disclosure includes:
an obtaining module 1001 configured to obtain an image to be processed;
a segmentation module 1002, configured to perform segmentation processing on a skin region of the image to be processed by using a pre-trained deep neural network, to obtain a skin segmentation image corresponding to the image to be processed, where the skin segmentation image is identified by a skin region and a non-skin region;
a processing module 1003, configured to perform special effect processing on a skin region and/or a non-skin region in the image to be processed based on the skin segmentation image and the image to be processed.
In a possible implementation manner, the segmentation module 1002 is configured to extract image feature information and structural information of the image to be processed by using the deep neural network trained in advance;
and performing segmentation processing on the skin area of the image to be processed based on the extracted image characteristic information and the extracted structural information to obtain a skin segmentation image corresponding to the image to be processed.
In a possible implementation manner, the segmentation module 1002 is configured to extract, by using the deep neural network trained in advance, image feature information corresponding to different feature dimensions in the image to be processed; wherein the plurality of feature dimensions include a first feature dimension and a second feature dimension that are adjacent, the first feature dimension being lower than the second feature dimension; the image feature information corresponding to the first feature dimension is determined based on the image feature information corresponding to the second feature dimension and the structural information of the image feature information corresponding to the second feature dimension; the structural information of the image characteristic information corresponding to the second characteristic dimension is extracted by using the pre-trained deep neural network;
and carrying out segmentation processing on the skin area of the image to be processed based on the image characteristic information respectively corresponding to different characteristic dimensions to obtain a skin segmentation image corresponding to the image to be processed.
In a possible implementation manner, the structured information of the image feature information corresponding to the second feature dimension includes a positional relationship between first feature points in the image feature information corresponding to the second feature dimension;
the segmentation module 1002 is further configured to determine image feature information corresponding to the first feature dimension according to the following steps:
for each second feature point in the first feature dimension, based on the position information of the second feature point, screening a first target feature point corresponding to the second feature point from the first feature points corresponding to the second feature dimension;
determining, based on the first feature dimension and the second feature dimension, a target number of second feature points in the first feature dimension that correspond to first feature points in the second feature dimension;
screening second target feature points of the target number from the first feature points corresponding to the second feature dimensions based on the position relation among the first feature points and the position information of the first target feature points;
and determining the image feature information of the second feature point based on the image feature information of the second target feature point, and determining the image feature information corresponding to the first feature dimension based on the determined image feature information of each second feature point in the first feature dimension.
In a possible implementation manner, the segmentation module 1002 is configured to determine, for each feature dimension of the different feature dimensions, a first semantic prediction result of the image to be processed in the feature dimension based on image feature information corresponding to the feature dimension;
determining the probability that each pixel point in the image to be processed is a pixel point corresponding to skin based on a first semantic prediction result of the image to be processed under each feature dimension;
and carrying out segmentation processing on the skin area of the image to be processed based on the probability that each pixel point in the image to be processed is a pixel point corresponding to the skin and a preset segmentation probability value.
In a possible implementation manner, the segmentation module 1002 is configured to perform multiple times of fusion processing according to the order from low to high of the different feature dimensions, and then obtain a probability that each pixel point in the image to be processed is a pixel point corresponding to the skin;
wherein, the ith fusion processing in the multiple fusion processing comprises the following steps:
determining confidence information of a first semantic prediction result under the first feature dimension;
fusing the first semantic prediction result under the first characteristic dimension and the first semantic prediction result under the second characteristic dimension by using the confidence information of the first semantic prediction result under the first characteristic dimension to obtain a target semantic prediction result under the second characteristic dimension;
and updating the target semantic prediction result into a first semantic prediction result of a first feature dimension in the (i + 1) th fusion process.
In a possible implementation manner, the segmentation module 1002 is configured to determine, for a lowest feature dimension, a first semantic prediction result of the image to be processed in the lowest feature dimension based on image feature information corresponding to the lowest feature dimension;
and for each second feature dimension except the lowest feature dimension, determining a first semantic prediction result of the image to be processed under the second feature dimension based on the image feature information corresponding to the second feature dimension and the first semantic prediction result of the image to be processed under the first feature dimension.
In a possible embodiment, the apparatus further comprises:
a training module 1004 for training the deep neural network according to the following steps:
obtaining a sample image, the sample image containing a skin region;
inputting the sample image into a deep neural network to be trained, and determining a prediction segmentation image of the sample image;
and generating a target loss based on the sample segmentation image corresponding to the sample image and the prediction segmentation image, and training the deep neural network to be trained by using the target loss to obtain the trained deep neural network, wherein the sample segmentation image is marked with skin identification information.
In one possible implementation, the training module 1004 is configured to obtain a first sample image having different skin features;
performing an image brightness changing operation on at least part of the first sample images to obtain second sample images;
taking the first sample image and the second sample image as the sample images.
In a possible implementation, the training module 1004 is configured to acquire a third sample image under different illumination intensities;
taking the first sample image, the second sample image, and the third sample image as the sample images.
The description of the processing flow of each module in the device and the interaction flow between the modules may refer to the related description in the above method embodiments, and will not be described in detail here.
An embodiment of the present disclosure further provides a computer device, as shown in fig. 11, which is a schematic structural diagram of a computer device provided in an embodiment of the present disclosure, and includes:
a processor 1101 and a memory 1102; the memory 1102 stores machine-readable instructions executable by the processor 1101, the processor 1101 configured to execute the machine-readable instructions stored in the memory 1102, the processor 1101 performing the following steps when the machine-readable instructions are executed by the processor 1101: s101: acquiring an image to be processed; s102: performing segmentation processing on a skin area of an image to be processed by using a pre-trained deep neural network to obtain a skin segmentation image corresponding to the image to be processed, and S103: and performing special effect processing on a skin area and/or a non-skin area in the image to be processed based on the skin segmentation image and the image to be processed.
The storage 1102 includes a memory 1121 and an external storage 1122; the memory 1121 is also referred to as an internal memory, and is used to temporarily store operation data in the processor 1101 and data exchanged with the external memory 1122 such as a hard disk, and the processor 1101 exchanges data with the external memory 1122 via the memory 1121.
For the specific execution process of the instruction, reference may be made to the steps of the image processing method described in the embodiments of the present disclosure, and details are not described here.
The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to perform the steps of the image processing method described in the above method embodiments. The storage medium may be a volatile or non-volatile computer-readable storage medium.
The computer program product of the image processing method provided in the embodiments of the present disclosure includes a computer-readable storage medium storing a program code, where instructions included in the program code may be used to execute steps of the image processing method described in the above method embodiments, which may be referred to specifically for the above method embodiments, and are not described herein again.
The computer program product may be embodied in hardware, software or a combination thereof. In an alternative embodiment, the computer program product is embodied in a computer storage medium, and in another alternative embodiment, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the system and the apparatus described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again. In the several embodiments provided in the present disclosure, it should be understood that the disclosed system, apparatus, and method may be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units is only one logical division, and there may be other divisions when actually implemented, and for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection of devices or units through some communication interfaces, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a non-volatile computer-readable storage medium executable by a processor. Based on such understanding, the technical solution of the present disclosure may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present disclosure. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
Finally, it should be noted that: the above-mentioned embodiments are merely specific embodiments of the present disclosure, which are used for illustrating the technical solutions of the present disclosure and not for limiting the same, and the scope of the present disclosure is not limited thereto, and although the present disclosure is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: any person skilled in the art can modify or easily conceive of the technical solutions described in the foregoing embodiments or equivalent technical features thereof within the technical scope of the present disclosure; such modifications, changes or substitutions do not depart from the spirit and scope of the embodiments of the present disclosure, and should be construed as being included therein. Therefore, the protection scope of the present disclosure shall be subject to the protection scope of the claims.

Claims (13)

1. An image processing method, comprising:
acquiring an image to be processed;
carrying out segmentation processing on a skin area on the image to be processed by utilizing a pre-trained deep neural network to obtain a skin segmentation image corresponding to the image to be processed, wherein the skin segmentation image is marked with a skin area and a non-skin area;
and performing special effect processing on a skin area and/or a non-skin area in the image to be processed based on the skin segmentation image and the image to be processed.
2. The method according to claim 1, wherein the segmenting the skin region of the image to be processed by using a pre-trained deep neural network to obtain a skin segmentation image corresponding to the image to be processed comprises:
extracting image characteristic information and structural information of the image to be processed by utilizing the pre-trained deep neural network;
and performing segmentation processing on the skin area of the image to be processed based on the extracted image characteristic information and the extracted structural information to obtain a skin segmentation image corresponding to the image to be processed.
3. The method according to claim 2, wherein the performing segmentation processing on the skin region on the image to be processed based on the extracted image feature information and the structured information to obtain a skin segmentation image corresponding to the image to be processed comprises:
extracting image characteristic information corresponding to different characteristic dimensions in the image to be processed respectively by utilizing the pre-trained deep neural network; wherein the plurality of feature dimensions include a first feature dimension and a second feature dimension that are adjacent, the first feature dimension being lower than the second feature dimension; the image feature information corresponding to the first feature dimension is determined based on the image feature information corresponding to the second feature dimension and the structural information of the image feature information corresponding to the second feature dimension; the structural information of the image characteristic information corresponding to the second characteristic dimension is extracted by using the pre-trained deep neural network;
and carrying out segmentation processing on the skin area of the image to be processed based on the image characteristic information respectively corresponding to different characteristic dimensions to obtain a skin segmentation image corresponding to the image to be processed.
4. The method according to claim 3, wherein the structured information of the image feature information corresponding to the second feature dimension includes a positional relationship between first feature points in the image feature information corresponding to the second feature dimension;
the method further comprises the step of determining image feature information corresponding to the first feature dimension:
for each second feature point in the first feature dimension, based on the position information of the second feature point, screening a first target feature point corresponding to the second feature point from the first feature points corresponding to the second feature dimension;
determining, based on the first feature dimension and the second feature dimension, a target number of second feature points in the first feature dimension that correspond to first feature points in the second feature dimension;
screening second target feature points of the target number from the first feature points corresponding to the second feature dimensions based on the position relation among the first feature points and the position information of the first target feature points;
and determining the image feature information of the second feature point based on the image feature information of the second target feature point, and determining the image feature information corresponding to the first feature dimension based on the determined image feature information of each second feature point in the first feature dimension.
5. The method according to claim 3 or 4, wherein the segmentation processing of the skin region on the image to be processed based on the image feature information respectively corresponding to different feature dimensions comprises:
for each feature dimension in the different feature dimensions, determining a first semantic prediction result of the image to be processed under the feature dimension based on image feature information corresponding to the feature dimension;
determining the probability that each pixel point in the image to be processed is a pixel point corresponding to skin based on a first semantic prediction result of the image to be processed under each feature dimension;
and carrying out segmentation processing on the skin area of the image to be processed based on the probability that each pixel point in the image to be processed is a pixel point corresponding to the skin and a preset segmentation probability value.
6. The method according to claim 5, wherein the determining, based on the first semantic prediction result of the image to be processed in each feature dimension, a probability that each pixel point in the image to be processed is a pixel point corresponding to a skin includes:
performing multiple times of fusion processing according to the sequence of the different feature dimensions from low to high to obtain the probability that each pixel point in the image to be processed is a pixel point corresponding to the skin;
wherein, the ith fusion processing in the multiple fusion processing comprises the following steps:
determining confidence information of a first semantic prediction result under the first feature dimension;
fusing the first semantic prediction result under the first characteristic dimension and the first semantic prediction result under the second characteristic dimension by using the confidence information of the first semantic prediction result under the first characteristic dimension to obtain a target semantic prediction result under the second characteristic dimension;
and updating the target semantic prediction result into a first semantic prediction result of a first feature dimension in the (i + 1) th fusion process.
7. The method according to claim 5 or 6, wherein the determining, for each of the different feature dimensions, a first semantic prediction result of the image to be processed in the feature dimension based on the image feature information corresponding to the feature dimension comprises:
aiming at the lowest feature dimension, determining a first semantic prediction result of the image to be processed under the lowest feature dimension based on image feature information corresponding to the lowest feature dimension;
and for each second feature dimension except the lowest feature dimension, determining a first semantic prediction result of the image to be processed under the second feature dimension based on the image feature information corresponding to the second feature dimension and the first semantic prediction result of the image to be processed under the first feature dimension.
8. The method of any one of claims 1 to 7, further comprising the step of training the deep neural network:
obtaining a sample image, the sample image containing a skin region;
inputting the sample image into a deep neural network to be trained, and determining a prediction segmentation image of the sample image;
and generating a target loss based on the sample segmentation image corresponding to the sample image and the prediction segmentation image, and training the deep neural network to be trained by using the target loss to obtain the trained deep neural network, wherein the sample segmentation image is marked with skin identification information.
9. The method of claim 8, wherein said obtaining a sample image comprises:
acquiring a first sample image having different skin characteristics;
performing an image brightness changing operation on at least part of the first sample images to obtain second sample images;
taking the first sample image and the second sample image as the sample images.
10. The method of claim 9, wherein said obtaining a sample image comprises:
acquiring a third sample image under different illumination intensities;
taking the first sample image, the second sample image, and the third sample image as the sample images.
11. An image processing apparatus characterized by comprising:
the acquisition module is used for acquiring an image to be processed;
the segmentation module is used for carrying out segmentation processing on a skin area on the image to be processed by utilizing a pre-trained deep neural network to obtain a skin segmentation image corresponding to the image to be processed, wherein the skin segmentation image is marked with a skin area and a non-skin area;
and the processing module is used for carrying out special effect processing on a skin area and/or a non-skin area in the image to be processed based on the skin segmentation image and the image to be processed.
12. A computer device, comprising: a processor, a memory storing machine-readable instructions executable by the processor, the processor for executing the machine-readable instructions stored in the memory, the processor performing the steps of the image processing method according to any one of claims 1 to 10 when the machine-readable instructions are executed by the processor.
13. A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, which computer program, when executed by a computer device, performs the steps of the image processing method according to any one of claims 1 to 10.
CN202110473429.2A 2021-04-29 2021-04-29 Image processing method, device, computer equipment and storage medium Active CN113129319B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110473429.2A CN113129319B (en) 2021-04-29 2021-04-29 Image processing method, device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110473429.2A CN113129319B (en) 2021-04-29 2021-04-29 Image processing method, device, computer equipment and storage medium

Publications (2)

Publication Number Publication Date
CN113129319A true CN113129319A (en) 2021-07-16
CN113129319B CN113129319B (en) 2023-06-23

Family

ID=76781170

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110473429.2A Active CN113129319B (en) 2021-04-29 2021-04-29 Image processing method, device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113129319B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017204864A1 (en) * 2016-05-26 2017-11-30 Raytheon Company Systems and methods for facilitating tracking a target in an imaged scene
WO2018177237A1 (en) * 2017-03-29 2018-10-04 腾讯科技(深圳)有限公司 Image processing method and device, and storage medium
CN111523546A (en) * 2020-04-16 2020-08-11 湖南大学 Image semantic segmentation method, system and computer storage medium
CN112258605A (en) * 2020-10-16 2021-01-22 北京达佳互联信息技术有限公司 Special effect adding method and device, electronic equipment and storage medium
CN112287763A (en) * 2020-09-27 2021-01-29 北京旷视科技有限公司 Image processing method, apparatus, device and medium
CN112651364A (en) * 2020-12-31 2021-04-13 北京市商汤科技开发有限公司 Image processing method, image processing device, electronic equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017204864A1 (en) * 2016-05-26 2017-11-30 Raytheon Company Systems and methods for facilitating tracking a target in an imaged scene
WO2018177237A1 (en) * 2017-03-29 2018-10-04 腾讯科技(深圳)有限公司 Image processing method and device, and storage medium
CN111523546A (en) * 2020-04-16 2020-08-11 湖南大学 Image semantic segmentation method, system and computer storage medium
CN112287763A (en) * 2020-09-27 2021-01-29 北京旷视科技有限公司 Image processing method, apparatus, device and medium
CN112258605A (en) * 2020-10-16 2021-01-22 北京达佳互联信息技术有限公司 Special effect adding method and device, electronic equipment and storage medium
CN112651364A (en) * 2020-12-31 2021-04-13 北京市商汤科技开发有限公司 Image processing method, image processing device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
MICHAEL MU-CHIEN HSU等: "Using Segmentation to Enhance Frame Prediction in a Multi-Scale Spatial-Temporal Feature Extraction Network", 《2020 INTERNATIONAL CONFERENCE ON PERVASIVE ARTIFICIAL INTELLIGENCE (ICPAI)》, pages 164 - 169 *

Also Published As

Publication number Publication date
CN113129319B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
Chen et al. Fsrnet: End-to-end learning face super-resolution with facial priors
JP7309116B2 (en) Gaze direction identification method, device, electronic device, and storage medium
US11341770B2 (en) Facial image identification system, identifier generation device, identification device, image identification system, and identification system
CN113469040B (en) Image processing method, device, computer equipment and storage medium
Kontschieder et al. Structured labels in random forests for semantic labelling and object detection
CN113128271A (en) Counterfeit detection of face images
Shrivastava et al. Artificial neural network based optical character recognition
CN111598065B (en) Depth image acquisition method, living body identification method, apparatus, circuit, and medium
CN113191938B (en) Image processing method, image processing device, electronic equipment and storage medium
Meenakshisundaram et al. A combined deep CNN-LSTM network for chromosome classification for metaphase selection
CN110163241B (en) Data sample generation method and device, computer equipment and storage medium
CN112651364B (en) Image processing method, device, electronic equipment and storage medium
CN112733946B (en) Training sample generation method and device, electronic equipment and storage medium
EP3989237A2 (en) Disease diagnosis system and method for performing segmentation by using neural network and unlocalized block
CN112802081A (en) Depth detection method and device, electronic equipment and storage medium
CN115131759A (en) Traffic marking recognition method, device, computer equipment and storage medium
CN113762117A (en) Training method of image processing model, image processing model and computer equipment
CN113129319A (en) Image processing method, image processing device, computer equipment and storage medium
CN113240760A (en) Image processing method and device, computer equipment and storage medium
JP2008040557A (en) Image display apparatus, image display method, and image display program
CN115861122A (en) Face image processing method and device, computer equipment and storage medium
CN113469041A (en) Image processing method and device, computer equipment and storage medium
CN116206114B (en) Portrait extraction method and device under complex background
CN114550247A (en) Facial expression recognition method and system with expression intensity change and storage medium
KR101365404B1 (en) an image recognition method and the image recognition device using thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant