WO2022078041A1 - 遮挡检测模型的训练方法及人脸图像的美化处理方法 - Google Patents

遮挡检测模型的训练方法及人脸图像的美化处理方法 Download PDF

Info

Publication number
WO2022078041A1
WO2022078041A1 PCT/CN2021/112308 CN2021112308W WO2022078041A1 WO 2022078041 A1 WO2022078041 A1 WO 2022078041A1 CN 2021112308 W CN2021112308 W CN 2021112308W WO 2022078041 A1 WO2022078041 A1 WO 2022078041A1
Authority
WO
WIPO (PCT)
Prior art keywords
face image
key point
occlusion
occluder
detection model
Prior art date
Application number
PCT/CN2021/112308
Other languages
English (en)
French (fr)
Inventor
李滇博
Original Assignee
上海哔哩哔哩科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海哔哩哔哩科技有限公司 filed Critical 上海哔哩哔哩科技有限公司
Priority to EP21879088.9A priority Critical patent/EP4207053A4/en
Publication of WO2022078041A1 publication Critical patent/WO2022078041A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/77Retouching; Inpainting; Scratch removal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Definitions

  • the present application relates to the technical field of image processing, and in particular, to a method for training an occlusion detection model and a method for beautifying a face image using the occlusion detection model.
  • the detection technology of key points in face images is developing day by day, including the traditional SDM, 3000FPS method and the recent key point detection method based on deep learning, which has reached a new height in detection speed and accuracy.
  • the inventor found that the existing face key point detection is based on the fact that the face image does not contain any occluders. In the case of including an occluder, the prior art cannot accurately judge the occluder, thereby affecting the accuracy of face key point detection.
  • the purpose of this application is to provide a technical solution that can accurately determine whether a key point in a face image is occluded, so as to solve the above-mentioned problems in the prior art.
  • the present application provides a training method for an occlusion detection model, comprising the following steps:
  • the training sample data includes the first face image to which the occluder is added, the coordinate value of the first key point in the first face image, and the occlusion information of the first key point;
  • the first face image is used as input data, and the coordinate value of the first key point and the occlusion information of the first key point are used as output data to train the occlusion detection model, so that the occlusion detection model is based on any arbitrary input value.
  • the coordinate value of the second key point included in the second face image and the occlusion probability of the second key point are output.
  • the step of constructing multiple training sample data includes:
  • the target occluder and the face image are synthesized to obtain the first face image with the occluder added;
  • the coordinate values of the first key points in the first face image and the occlusion information of each of the first key points are recorded.
  • the acquiring an original occluder image including an occluder, and the step of extracting a target occluder from the original occluder image includes:
  • the step of synthesizing the target occlusion object and the face image to obtain the first face image with the occlusion object added includes:
  • the second object has the same size and shape as the first object
  • the pixel value of the corresponding pixel in the second object is replaced with the pixel value of each pixel in the first object.
  • the step of synthesizing the target occlusion object and the face image to obtain the first face image with the occlusion object added includes:
  • the second object has the same size and shape as the first object
  • the pixel value of the corresponding pixel in the second object is replaced with the pixel value of each pixel in the first object.
  • the step of constructing multiple training sample data includes:
  • the key point feature and the apparent feature are input into a decoder, and the first face image is generated by the decoder.
  • the first encoder, the second encoder and the decoder are obtained by training through the following steps:
  • the first encoder, the second encoder and the decoder are back-trained based on the loss function.
  • the first face image is used as input data, and the coordinate value of the first key point and the occlusion information of the first key point are used as output data for training
  • the steps of the occlusion detection model include:
  • the first neural network is trained, so that the first neural network outputs the coordinate value of the predicted key point based on the inputted first face image;
  • Select the output of the hidden layer in the first neural network take the output of the hidden layer as the input, train the second neural network, and output the occlusion probability of the predicted key point;
  • the first loss function of the first neural network is determined according to the coordinates of the predicted key point and the coordinate value of the first key point, and the occlusion probability of the predicted key point and the occlusion information of the first key point are determined according to the predicted key point. determining a second loss function of the second neural network;
  • Reverse training is performed based on the synthetic loss function to determine occlusion parameters in the model.
  • the expression of the comprehensive loss function is:
  • pi represents the occlusion probability of the ith prediction key point
  • li represents the first loss function of the first neural network
  • o i represents the second loss function of the second neural network
  • ⁇ 1 and ⁇ 2 represent the empirical parameters respectively .
  • the present application also proposes a beautification processing method for a face image, including:
  • the face image is beautified according to the occlusion probability.
  • the present application also provides a training device for an occlusion detection model, including:
  • the sample data construction module is suitable for constructing a plurality of training sample data, and the training sample data includes the first face image with the occluder added, the coordinate value of the first key point in the first face image, and the first face image. Occlusion information of a key point;
  • a model training module adapted to use the first face image as input data, and use the coordinate value of the first key point and the occlusion information of the first key point as output data to train an occlusion detection model, so that the occlusion detection model
  • the detection model outputs the coordinate value of the second key point included in the second face image and the occlusion probability of the second key point based on the input of any second face image.
  • the present application also provides a beautification processing device for a human face image, including:
  • an image acquisition module which is suitable for acquiring the third face image to be processed
  • an occlusion detection module adapted to input the third face image into the above-mentioned occlusion detection model, and output the coordinate value of the third key point in the third face image and the occlusion probability of the third key point;
  • a beautification module adapted to beautify the face image according to the occlusion probability.
  • the present application also provides a computer device, comprising a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, and the processor implements the computer-readable instructions when executing the computer-readable instructions.
  • a computer device comprising a memory, a processor, and computer-readable instructions stored in the memory and executable on the processor, and the processor implements the computer-readable instructions when executing the computer-readable instructions.
  • the training sample data includes the first face image to which the occluder is added, the coordinate value of the first key point in the first face image, and the occlusion information of the first key point;
  • the first face image is used as input data, and the coordinate value of the first key point and the occlusion information of the first key point are used as output data to train the occlusion detection model, so that the occlusion detection model is based on any arbitrary input value.
  • the coordinate value of the second key point included in the second face image and the occlusion probability of the second key point are output.
  • the present application also provides a computer-readable storage medium on which computer-readable instructions are stored, and when the computer-readable instructions are executed by a processor, the following steps are implemented:
  • the training sample data includes the first face image to which the occluder is added, the coordinate value of the first key point in the first face image, and the occlusion information of the first key point;
  • the first face image is used as input data, and the coordinate value of the first key point and the occlusion information of the first key point are used as output data to train the occlusion detection model, so that the occlusion detection model is based on any arbitrary input value.
  • the coordinate value of the second key point included in the second face image and the occlusion probability of the second key point are output.
  • the training method of the occlusion detection model and the beautification processing method of the face image provided by the present application can accurately identify whether the key points in the face image are occluded, and on this basis, perform corresponding beautification processing on the face image.
  • This application first constructs an occluder-added face image based on an existing single face image and a single occluder image, and analyzes the position of key points in the occluder-added face image and whether each key point is occluded. callout.
  • an occlusion detection model that can accurately predict whether the key points in the face are occluded is obtained.
  • the detection results of the occlusion detection model are used to perform corresponding beautification processing on different positions of the face image, which can effectively improve the accuracy and authenticity of the face image recognition and improve the user experience.
  • FIG. 1 shows a flowchart of a training method for an occlusion detection model according to Embodiment 1 of the present application
  • FIG. 2 shows a schematic flowchart of constructing training sample data in Embodiment 1 of the present application
  • FIG. 3 shows a schematic flowchart of synthesizing a first face image in Embodiment 1 of the present application
  • FIG. 4 shows another schematic flowchart of synthesizing the first face image in Embodiment 1 of the present application
  • FIG. 5 shows a schematic diagram of a network for synthesizing a first face image by an encoder and a decoder in Embodiment 1 of the present application;
  • FIG. 6 shows a schematic flowchart of training a first encoder, a second encoder, and a decoder in Embodiment 1 of the present application;
  • FIG. 7 shows a schematic flowchart of training an occlusion detection model in Embodiment 1 of the present application
  • FIG. 8 shows a schematic structural diagram of the first neural network and the second neural network in Embodiment 1 of the present application
  • FIG. 9 shows a schematic diagram of a program module of a training device for an occlusion detection model in Embodiment 1 of the present application.
  • FIG. 10 shows a schematic diagram of the hardware structure of a training device for an occlusion detection model in Embodiment 1 of the present application
  • FIG. 11 shows a schematic flowchart of a method for beautifying a face image in Embodiment 2 of the present application
  • FIG. 12 shows a schematic diagram of program modules of an apparatus for beautifying a face image in Embodiment 2 of the present application.
  • this embodiment proposes a training method for an occlusion detection model, including the following steps:
  • S100 Construct training sample data, where the training sample data includes a first face image with an occluder added, a coordinate value of a first key point in the first face image, and occlusion information of the first key point.
  • the first face image with the occluder added can be constructed on the basis of the existing face image.
  • the key points of the first face image do not change compared with the existing face image, so the existing arbitrary key point detection technology can be used to determine the coordinate value of the key point in the existing face image, that is, The coordinate value of the first key point in the first face image.
  • the occluder can be added to any position in the face image, and then the first key point in the first face image is marked with occlusion information according to the added position.
  • the occlusion information can include not occluded (set to 0) and already Occlusion (set to 1).
  • the occluder is a mask, and the added position is the face area under the eyes in the face image. In this way, the occlusion information of the left corner of the mouth is occluded.
  • the first key point is the end of the left eyebrow, it is obvious that the occlusion information of the end of the left eyebrow is not occluded.
  • S200 Use the first face image as input data, use the coordinate value of the first key point and the occlusion information of the first key point as output data to train an occlusion detection model, so that the occlusion detection model is based on the input , outputting the coordinate value of the second key point included in the second face image and the occlusion probability of the second key point.
  • any existing neural network model can be used to perform machine learning training based on the training sample data.
  • models such as mobilenet and shufflenet can be selected for the mobile terminal, and models such as resnet and inception can be selected for the cloud.
  • the selection of the model can be determined according to different requirements of application scenarios, which is not limited in this application.
  • the occlusion detection model predicts and outputs the coordinate value of each key point contained in the first face image, and the probability of whether the key point is occluded.
  • the above probability can be any value between [0, 1], and the larger the value, the greater the probability of being blocked.
  • the coordinate value of the first key point and the occlusion information of the first key point can be used as the true value data to calculate the coordinate value of each key point predicted by the occlusion detection model and the loss function corresponding to the occlusion probability of the key point.
  • the function can choose for example mseloss or wingloss. Any existing optimization algorithm can be used to minimize the loss function, such as gradient descent method, adaptive time estimation method, etc., to determine the occlusion parameters of the occlusion detection model, such as the corresponding weight values of each neuron.
  • this embodiment can obtain a stable and ideal occlusion detection model, which can automatically predict the coordinates of key points and the occlusion probability of key points for any input face image.
  • FIG. 2 shows a schematic flowchart of constructing training sample data in Embodiment 1 of the present application.
  • step S100 constructing training sample data includes:
  • the original face image may contain multiple pixels, and each pixel has a corresponding pixel value, such as a color value composed of RGB.
  • the original face image can be represented in the form of a matrix, and each element in the matrix corresponds to a pixel.
  • S120 Acquire an original occluder image including an occluder, and extract a target occluder from the original occluder image.
  • the size of the original occlusion image is preferably the same as that of the original face image, that is, it contains the same number of pixels.
  • the original occlusion image can also be represented in the form of a matrix.
  • the existing computer vision technology can be used to delete the background information in the original occluder image, so as to extract the background information without the target occluder. Interfering pure target occluders.
  • the original occluder image can be segmented by using image segmentation techniques such as deep learning networks such as Mask R-CNN, DeepLab, etc., or the traditional graphcut method, to obtain the segmentation contour (Mask).
  • the segmentation contour is used to distinguish the front well and the background in the original occluder image. Specifically, the area where the target occluder is located is used as the foreground, and other areas are used as the background. It should be noted that the segmentation contour is only a simple binary segmentation of the contour of the target occluder. For example, the pixel value in the foreground is set to 1 (that is, pure white), and the pixel value in the background is set to 0 (that is, pure white). black).
  • the color of the real target occluder may not necessarily be pure black. Therefore, in this step, the segmentation contour and the original occluder image are used for convolution calculation, and the target occluder can be obtained, that is, the real pixel value corresponding to each pixel in the target occluder can be obtained.
  • S130 Synthesize the target occluder and the face image to obtain the first face image to which the occluder is added.
  • FIG. 3 shows a schematic flowchart of synthesizing a first face image in Embodiment 1 of the present application.
  • step S130 includes:
  • the first object may include all or part of the pixels in the target occlusion.
  • affine transformation may also be performed on the first object, such as translation, scaling, flipping, rotation, staggering, etc., to obtain the first object in different states.
  • S132 Select a second object from any position in the face image, where the size and shape of the second object are the same as the first object. For example, if the shape of the first object is an ellipse and contains M pixels, the second object also needs to be an ellipse and contains M pixels.
  • the size and shape of the second object need to be the same as the size and shape of the transformed first object.
  • the color of the target occluder is different from the skin color of the face. It can be understood that setting the pixel value of the corresponding pixel in the second object as the pixel value of the pixel in the first object can visually achieve the effect of occluding the first object on the face image.
  • the original face image marked with key points is denoted by A
  • the original occluder image is denoted by I
  • the segmentation contour is denoted by M
  • the target occluder Z I*M.
  • Z' is obtained after affine transformation on Z
  • M' is obtained after affine transformation on M correspondingly
  • S140 Record the coordinate value of the first key point in the first face image, and the occlusion information of each of the first key points.
  • the coordinate value of the first key point is determined in advance, and the occlusion information of the first key point is determined according to the area range of the second object. For example, the occlusion information of the first key point that falls within the area of the second object is occluded (set to 1), and the occlusion information of the first key point that falls outside the area of the second object is not occluded ( set to 0).
  • the above steps can obtain the first face image with the occluder added, which is beneficial to provide abundant training sample data, thereby improving the accuracy of the occlusion detection model.
  • FIG. 4 shows another schematic flowchart of synthesizing the first face image in Embodiment 1 of the present application.
  • step S130 includes:
  • S131' Acquire a third face image with key points marked without occluders, and a fourth face image with no occluders marked with key points.
  • the third face image in this embodiment may be, for example, a frontal bareheaded photo, and each key point has been marked in advance.
  • the fourth face image in this embodiment includes many occluders, such as wearing hats, masks, glasses, etc., and the fourth face image does not need to be marked with key points in advance.
  • S132' Use the first encoder to extract the key point features in the third face image.
  • the first encoder may be composed of any neural network, such as a convolutional neural network. Since the key points in the third face image have been marked in advance, the key point features extracted by the first encoder have high accuracy.
  • S133' Extract the apparent features in the fourth face image using the second encoder; wherein the apparent features include occluder features.
  • Appearance features refer to other features in a face image other than key point features, such as appearance features, accessory features, occlusion features, and the like.
  • S134' Input the key point feature and the apparent feature into a decoder, and use the decoder to generate the first face image.
  • FIG. 5 shows a schematic diagram of a network for synthesizing a first face image through an encoder and a decoder in Embodiment 1 of the present application.
  • Eg represents the first encoder, which is used to extract key point features
  • Ea represents the second encoder, which is used to extract apparent features
  • D represents the decoder, which is used to extract the key point features extracted by Eg and Ea
  • the extracted apparent features are synthesized, and finally the first face image is generated.
  • a decoder is used to restore the extracted key point features and apparent features to the first face image.
  • the key point features in the first face image come from the third face image.
  • the apparent features in the face image come from the fourth face image, wherein the apparent features in the first face image include more occlusion features. In this way, the first face image with the occluder added is constructed.
  • the position coordinates of the key points of the face are known, but the coordinates of the occluders are not yet known.
  • the unrecognized part in the facial image is the location of the occluder.
  • the position of the occluder it can be further determined which key points of the face have been occluded and which have not been occluded.
  • FIG. 6 shows a schematic flowchart of training the first encoder, the second encoder, and the decoder in Embodiment 1 of the present application.
  • the neural network composed of the first encoder, the second encoder and the decoder in this embodiment is obtained by training through the following steps:
  • S610 Use the first encoder to extract the target key point feature in the third face image, and this step is used to train the first encoder to extract the key point feature.
  • S620 Use the second encoder to extract the target apparent feature in the third face image, and this step is used to train the ability of the second encoder to extract the apparent feature.
  • S630 Input the target key point feature and the target apparent feature into the decoder, and use the decoder to generate a target face image. This step is used to train the decoder's ability to synthesize images based on keypoint features and appearance features.
  • loss function L in this embodiment can be expressed by the following formula:
  • x represents the third face image
  • y represents the key point feature in the third face image
  • z represents the apparent feature in the third face image
  • G(Ea+Eg) represents the target person generated by the decoder Face image
  • q represents the distribution probability of the predicted data
  • p represents the distribution probability of the true data
  • KL represents the divergence function.
  • S650 Perform reverse training on the first encoder, the second encoder and the decoder based on the loss function.
  • FIG. 7 shows a schematic flowchart of training an occlusion detection model in Embodiment 1 of the present application.
  • step S200 includes:
  • the first neural network in this embodiment may be any existing neural network model.
  • models such as mobilenet and shufflenet may be selected for the mobile terminal, and models such as resnet and inception may be selected for the cloud, which are not limited in this application.
  • the first face image and the coordinate values of the first key point have been obtained in the foregoing, the first face image is used as input data, and the coordinate value of the first key point is used as the ground truth data to train the first neural network, Make it output the coordinate value of the predicted key point that is closer to the coordinate value of the first key point.
  • S220 Select the output of the hidden layer in the first neural network, use the output of the hidden layer as the input, train the second neural network, and output the occlusion probability of the predicted key point.
  • any neural network includes an input layer, an output layer and a hidden layer, wherein the specific number of layers of the hidden layer can be set to one or more layers according to actual needs.
  • the output data of one hidden layer is selected, the output data of the hidden layer is used as the input layer of the second neural network, and the occlusion information of the first key point obtained in the previous section is used as the true value data to train the second neural network.
  • the neural network is used to output the occlusion probability of the predicted key point that is closer to the occlusion information of the first key point.
  • the occlusion probability can be any value in [0, 1]. The larger the value, the higher the probability that the corresponding prediction key point is occluded.
  • the second neural network may be any existing neural network model.
  • models such as mobilenet and shufflenet may be selected for the mobile terminal, and models such as resnet and inception may be selected for the cloud, which is not limited in this application.
  • FIG. 8 shows a schematic structural diagram of the first neural network and the second neural network in Embodiment 1 of the present application.
  • the first neural network includes a first input layer, a first hidden layer and a first output layer
  • the second neural network includes a second input layer, a second hidden layer and a second output layer.
  • the first input layer is used to receive the input first face image
  • the first output layer is used to output the coordinate values of the predicted key points contained in the first face image.
  • the first hidden layer may specifically include one or more layers, and the output of one of the hidden layers is used as the input of the second hidden layer, so that the occlusion probability of the key point is predicted through the output of the second hidden layer.
  • this embodiment uses two neural networks to construct an occlusion detection model, and can obtain two sets of different output results according to a set of input data, thereby simultaneously predicting the coordinates of key points in the face image and the occlusion probability of key points.
  • S230 Determine the first loss function of the first neural network according to the coordinates of the predicted key point and the coordinate value of the first key point, and according to the occlusion probability of the predicted key point and the first key point
  • the occlusion information determines a second loss function of the second neural network.
  • the loss function characterizes the gap between the predicted data and the ground-truth data.
  • a first neural network and a second neural network are used to construct an occlusion detection model, and a first loss function and a second loss function are correspondingly generated.
  • S240 Determine a comprehensive loss function of the occlusion detection model according to the first loss function and the second loss function.
  • the comprehensive loss function of the occlusion detection model in this embodiment is composed of the first loss function and the second loss function.
  • the combined loss function loss is determined by:
  • pi represents the occlusion probability of the ith prediction key point
  • li represents the first loss function of the first neural network
  • o i represents the second loss function of the second neural network
  • ⁇ 1 and ⁇ 2 represent the empirical parameters respectively .
  • S250 Perform reverse training based on the comprehensive loss function to determine occlusion parameters in the model.
  • the occlusion detection model can be reverse-trained through an optimization algorithm such as a back-propagation algorithm or a gradient descent algorithm, and the occlusion parameters in the occlusion detection model can be adjusted so that the comprehensive loss function of the model on the training data set reaches smaller value.
  • the above-mentioned occlusion parameter may be a weight value corresponding to each neuron in the occlusion detection model.
  • an ideal occlusion detection model can be obtained in this embodiment, and the model can accurately output the coordinate value of the predicted key point and the occlusion probability of the predicted key point according to any input face image.
  • the training device 90 may include or be divided into one or more program modules, and one or more program modules are stored in the storage In the medium, and executed by one or more processors, the present application is completed, and the training method of the above-mentioned occlusion detection model can be realized.
  • the program module referred to in this application refers to an instruction segment of a series of computer-readable instructions capable of accomplishing specific functions, and is more suitable for describing the execution process of the training device 90 of the occlusion detection model in the storage medium than the program itself. The following description will specifically introduce the functions of each program module in this embodiment:
  • the sample data construction module 91 is suitable for constructing training sample data, and the training sample data includes the first face image with the occluder added, the coordinate value of the first key point in the first face image, and the first face image. Occlusion information of key points;
  • the model training module 92 is adapted to use the first face image as input data, and use the coordinate value of the first key point and the occlusion information of the first key point as output data to train an occlusion detection model, so that the The occlusion detection model outputs the coordinate value of the second key point included in the second face image and the occlusion probability of the second key point based on the input of any second face image.
  • This embodiment also provides a computer device, such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a rack server, a blade server, a tower server or a cabinet server (including independent servers, or A server cluster composed of multiple servers), etc.
  • the computer device 100 in this embodiment at least includes, but is not limited to, a memory 101 and a processor 102 that can be communicatively connected to each other through a system bus, as shown in FIG. 10 .
  • FIG. 10 only shows the computer device 100 having components 101-102, but it should be understood that implementation of all of the illustrated components is not required, and more or fewer components may be implemented instead.
  • the memory 101 (that is, a readable storage medium) includes a flash memory, a hard disk, a multimedia card, a card-type memory (eg, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disk, etc.
  • the memory 101 may be an internal storage unit of the computer device 100 , such as a hard disk or a memory of the computer device 100 .
  • the memory 101 may also be an external storage device of the computer device 100, such as a plug-in hard disk, a smart memory card (Smart Media Card, SMC), a secure digital (Secure Digital, SD) card, flash memory card (Flash Card), etc.
  • the memory 101 may also include both the internal storage unit of the computer device 100 and its external storage device.
  • the memory 101 is generally used to store the operating system and various application software installed in the computer device 100 , such as the program code of the training device 90 of the occlusion detection model in the first embodiment.
  • the memory 101 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 102 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips.
  • the processor 102 is typically used to control the overall operation of the computer device 100 .
  • the processor 102 is configured to run the program code or process data stored in the memory 101 , for example, run the training device 90 of the occlusion detection model, so as to implement the training method of the occlusion detection model of the first embodiment.
  • This embodiment also provides a computer-readable storage medium, which may be a volatile storage medium or a non-volatile storage medium, such as a flash memory, a hard disk, a multimedia card, a card-type memory ( For example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory ( PROM), magnetic storage, magnetic disk, optical disk, server, App application mall, etc., on which computer-readable instructions are stored, and the program realizes corresponding functions when executed by the processor.
  • the computer-readable storage medium of this embodiment is used to store the training device 80 of the occlusion detection model, and when executed by the processor, implements the training method of the occlusion detection model of the first embodiment.
  • this embodiment proposes a method for beautifying a face image, including the following steps:
  • S100' Obtain a fifth face image to be processed, such as a photo taken by a user.
  • S200' input the fifth face image into the above-mentioned occlusion detection model, output the coordinate value of the fifth key point in the fifth face image and the occlusion probability of the fifth key point. For example, output the coordinate positions of multiple key points such as eyes, nose, lips, jaw, etc. in the photo taken by the user, and output the probability that there are occluders at the coordinate positions of these key points, such as whether the eyes are blocked, whether the mouth is blocked, etc.
  • S300' beautify the face image according to the occlusion probability.
  • the beautification processing is only applied to the positions of key points determined to be free of occlusion, and not processed to the positions of key points determined to be occluded.
  • the lip position will not be enhanced; or when it is detected that the user's eyes are blocked by sunglasses, the eye position will not be contoured, etc.
  • the beautification processing of the face image can be made more in line with the real scene, thereby improving the user experience.
  • FIG. 12 shows an apparatus 120 for beautifying a face image, including:
  • an image acquisition module 121 adapted to acquire a third face image to be processed
  • the occlusion detection module 122 is adapted to input the third face image into the above-mentioned occlusion detection model, and output the coordinate value of the third key point in the third face image and the occlusion probability of the third key point;
  • the beautification module 123 is adapted to perform beautification processing on the face image according to the occlusion probability.
  • any description of a process or method in the flowcharts or otherwise described herein may be understood to represent a module, segment or portion of code comprising one or more executable instructions for implementing a specified logical function or step of the process , and the scope of the preferred embodiments of the present application includes alternative implementations in which the functions may be performed out of the order shown or discussed, including performing the functions substantially concurrently or in the reverse order depending upon the functions involved, which should It is understood by those skilled in the art to which the embodiments of the present application belong.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Image Analysis (AREA)

Abstract

一种遮挡检测模型的训练方法及人脸图像的美化处理方法,所述训练方法包括以下步骤:构造多个训练样本数据,所述训练样本数据包括添加了遮挡物的第一人脸图像、所述第一人脸图像中第一关键点的坐标值以及所述第一关键点的遮挡信息(S100);将所述第一人脸图像作为输入数据,将所述第一关键点的坐标值以及所述第一关键点的遮挡信息作为输出数据训练遮挡检测模型,使所述遮挡检测模型基于输入的任意第二人脸图像,输出所述第二人脸图像中包含的第二关键点的坐标值以及所述第二关键点的遮挡概率(S200)。

Description

遮挡检测模型的训练方法及人脸图像的美化处理方法
本申请要求于2020年10月16日提交中国专利局、申请号为202011111254.2,发明名称为“遮挡检测模型的训练方法及人脸图像的美化处理方法”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像处理技术领域,特别涉及一种遮挡检测模型的训练方法及应用该遮挡检测模型的人脸图像的美化处理方法。
背景技术
目前对人脸图像中关键点的检测技术日益发展,包括传统的SDM,3000FPS方法以及近期出现的基于深度学习的关键点检测方法在检测的速度和精度上已经达到了新的高度。然而,发明人发现,现有的人脸关键点检测都是基于人脸图像中不包含任何遮挡物而进行的。对于包含遮挡物的情况,现有技术无法对遮挡物进行准确的判断,从而影响人脸关键点检测的准确性。
发明内容
本申请的目的是提供一种能够准确判断人脸图像中的关键点是否被遮挡的技术方案,以解决现有技术中存在的上述问题。
为实现上述目的,本申请提供一种遮挡检测模型的训练方法,包括以下步骤:
构造多个训练样本数据,所述训练样本数据包括添加了遮挡物的第一人脸图像、所述第一人脸图像中第一关键点的坐标值以及所述第一关键点的遮挡信息;
将所述第一人脸图像作为输入数据,将所述第一关键点的坐标值以及所述第一关键点的遮挡信息作为输出数据训练遮挡检测模型,使所述遮挡检测模型基于输入的任意第二人脸图像,输出所述第二人脸图像中包含的第二关键点的坐标值以及所述第二关键点的遮挡概率。
根据本申请提供的遮挡检测模型的训练方法,所述构造多个训练样本数据的步骤包括:
获取不包含遮挡物的原始人脸图像;
获取包含遮挡物的原始遮挡物图像,从所述原始遮挡物图像中提取目标遮挡物;
将所述目标遮挡物和所述人脸图像进行合成,以得到所述添加了遮挡物的第一人脸图 像;
记录所述第一人脸图像中的第一关键点的坐标值,以及每个所述第一关键点的遮挡信息。
根据本申请提供的遮挡检测模型的训练方法,所述获取包含遮挡物的原始遮挡物图像,从所述原始遮挡物图像中提取目标遮挡物的步骤包括:
基于图像分割技术获取目标遮挡物的分割轮廓;
用分割轮廓和所述原始遮挡物图像进行卷积计算,以得到所述目标遮挡物。
根据本申请提供的遮挡检测模型的训练方法,所述将所述目标遮挡物和所述人脸图像进行合成,以得到所述添加了遮挡物的第一人脸图像的步骤包括:
从所述目标遮挡物中选取第一对象;
从所述人脸图像中的任意位置选择第二对象,所述第二对象的大小和形状与所述第一对象相同;
用所述第一对象内包含每个像素点的像素值替换所述第二对象内相应像素点的像素值。
根据本申请提供的遮挡检测模型的训练方法,所述将所述目标遮挡物和所述人脸图像进行合成,以得到所述添加了遮挡物的第一人脸图像的步骤包括:
从所述遮挡物图像中选取目标对象,对所述目标对象随机变换以得到第一对象;
从所述人脸图像中的任意位置选择第二对象,所述第二对象的大小和形状与所述第一对象相同;
用所述第一对象内包含每个像素点的像素值替换所述第二对象内相应像素点的像素值。
根据本申请提供的训练方法,所述构造多个训练样本数据的步骤包括:
获取标注了关键点的不包含遮挡物的第三人脸图像,以及未标注关键点的包含遮挡物的第四人脸图像;
利用第一编码器提取所述第三人脸图像中的关键点特征;
利用第二编码器提取所述第四人脸图像中的表观特征;其中所述表观特征中包含遮挡物特征;
将所述关键点特征和所述表观特征输入解码器,利用所述解码器生成所述第一人脸图像。
根据本申请提供的训练方法,所述第一编码器、所述第二编码器和所述解码器通过以下步骤训练得到:
利用所述第一编码器提取所述第三人脸图像中的目标关键点特征;
利用所述第二编码器提取所述第三人脸图像中的目标表观特征;
将所述目标关键点特征和所述目标表观特征输入所述解码器,利用所述解码器生成目标人脸图像;
将所述第三人脸图像作为真值数据,确定所述目标人脸图像与所述真值数据之间的损失函数;
基于所述损失函数对所述第一编码器、所述第二编码器和所述解码器进行反向训练。
根据本申请提供的遮挡检测模型的训练方法,所述将所述第一人脸图像作为输入数据,将所述第一关键点的坐标值以及所述第一关键点的遮挡信息作为输出数据训练遮挡检测模型的步骤包括:
对第一神经网络进行训练,使所述第一神经网络基于输入的所述第一人脸图像,输出预测关键点的坐标值;
选择第一神经网络中隐含层的输出,将隐含层的输出作为输入,对第二神经网络进行训练,输出所述预测关键点的遮挡概率;
根据所述预测关键点的坐标和所述第一关键点的坐标值确定所述第一神经网络的第一损失函数,根据所述预测关键点的遮挡概率和所述第一关键点的遮挡信息确定所述第二神经网络的第二损失函数;
根据所述第一损失函数和所述第二损失函数确定所述遮挡检测模型的综合损失函数;
基于所述综合损失函数进行反向训练,以确定所述模型中的遮挡参数。
根据本申请提供的遮挡检测模型的训练方法,所述综合损失函数的表达式为:
Figure PCTCN2021112308-appb-000001
其中,p i代表第i个预测关键点的遮挡概率,l i代表第一神经网络的第一损失函数,o i代表第二神经网络的第二损失函数,λ 1和λ 2分别代表经验参数。
为实现上述目的,本申请还提出一种人脸图像的美化处理方法,包括:
获取待处理的第五人脸图像;
将所述第五人脸图像输入上述的遮挡检测模型,输出所述第五人脸图像中的第五关键点的坐标值以及所述第五关键点的遮挡概率;
根据所述遮挡概率对所述人脸图像进行美化处理。
为实现上述目的,本申请还提供一种遮挡检测模型的训练装置,包括:
样本数据构造模块,适用于构造多个训练样本数据,所述训练样本数据包括添加了遮挡物的第一人脸图像、所述第一人脸图像中第一关键点的坐标值以及所述第一关键点的遮挡信息;
模型训练模块,适用于将所述第一人脸图像作为输入数据,将所述第一关键点的坐标值以及所述第一关键点的遮挡信息作为输出数据训练遮挡检测模型,使所述遮挡检测模型基于输入的任意第二人脸图像,输出所述第二人脸图像中包含的第二关键点的坐标值以及所述第二关键点的遮挡概率。
为实现上述目的,本申请还提供一种人脸图像的美化处理装置,包括:
图像获取模块,适用于获取待处理的第三人脸图像;
遮挡检测模块,适用于将所述第三人脸图像输入上述的遮挡检测模型,输出所述第三人脸图像中的第三关键点的坐标值以及所述第三关键点的遮挡概率;
美化模块,适用于根据所述遮挡概率对所述人脸图像进行美化处理。
为实现上述目的,本申请还提供一种计算机设备,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现以下步骤:
构造多个训练样本数据,所述训练样本数据包括添加了遮挡物的第一人脸图像、所述第一人脸图像中第一关键点的坐标值以及所述第一关键点的遮挡信息;
将所述第一人脸图像作为输入数据,将所述第一关键点的坐标值以及所述第一关键点的遮挡信息作为输出数据训练遮挡检测模型,使所述遮挡检测模型基于输入的任意第二人脸图像,输出所述第二人脸图像中包含的第二关键点的坐标值以及所述第二关键点的遮挡概率。
为实现上述目的,本申请还提供计算机可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现以下步骤:
构造多个训练样本数据,所述训练样本数据包括添加了遮挡物的第一人脸图像、所述第一人脸图像中第一关键点的坐标值以及所述第一关键点的遮挡信息;
将所述第一人脸图像作为输入数据,将所述第一关键点的坐标值以及所述第一关键点的遮挡信息作为输出数据训练遮挡检测模型,使所述遮挡检测模型基于输入的任意第二人脸图像,输出所述第二人脸图像中包含的第二关键点的坐标值以及所述第二关键点的遮挡概率。
本申请提供的遮挡检测模型的训练方法及人脸图像的美化处理方法,能够准确识别出人脸图像中的关键点是否被遮挡,并在此基础上对人脸图像进行相应的美化处理。本申请首先基于现有的单一人脸图像以及单一遮挡物图像构造添加了遮挡物的人脸图像,对添加了遮挡物的人脸图像中的关键点位置以及每个关键点是否被遮挡进行了标注。利用构造的遮挡物的人脸图像以及对应的标注数据训练神经网络模型,从而得到可以准确预测人脸中 关键点是否被遮挡的遮挡检测模型。进而利用遮挡检测模型的检测结果对人脸图像的不同位置进行相应的美化处理,可以有效提高人脸图像识别的准确性和真实性,提升用户体验。
附图说明
图1示出了本申请实施例一的遮挡检测模型的训练方法的流程图;
图2示出了本申请实施例一中构造训练样本数据的示意性流程图;
图3示出了本申请实施例一中合成第一人脸图像的一个示意性流程图;
图4示出了本申请实施例一中合成第一人脸图像的另一个示意性流程图;
图5示出了本申请实施例一中通过编码器和解码器合成第一人脸图像的网络示意图;
图6示出了本申请实施例一中训练第一编码器、第二编码器和解码器的示意性流程图;
图7示出了本申请实施例一中训练遮挡检测模型的示意性流程图;
图8示出了本申请实施例一中第一神经网络和第二神经网络的结构示意图;
图9示出了本申请实施例一中遮挡检测模型的训练装置的程序模块示意图;
图10示出了本申请实施例一中遮挡检测模型的训练装置的硬件结构示意图;
图11示出了本申请实施例二中人脸图像的美化处理方法的示意性流程图;
图12示出了本申请实施例二中人脸图像的美化处理装置的程序模块示意图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
实施例一
请参阅图1,本实施例提出一种遮挡检测模型的训练方法,包括以下步骤:
S100:构造训练样本数据,所述训练样本数据包括添加了遮挡物的第一人脸图像、所述第一人脸图像中第一关键点的坐标值以及所述第一关键点的遮挡信息。
目前公开的人脸图像关键点数据集数量有限,其中包含遮挡物的情况又是少之又少。但在实际生活中,人们拍摄图像时常常会佩戴帽子、眼镜、口罩等防护用品,这些防护用品可能会对人脸中的某些关键点造成遮挡。现有的关键点检测技术无法区分这些遮挡,可能会在防护用品的位置处仍然识别出对应的五官关键点,造成识别出的人脸图像与真实情况不符。另外,现有的一些美颜功能或者化妆功能会针对检测出的人脸关键点进行色彩增 强、修饰轮廓等美化处理,如果不能准确识别人脸图像中的遮挡物,就可能会出现在口罩上方添加口红的怪异效果,造成用户体验不佳。因此本步骤可以在现有的人脸图像基础上构造添加了遮挡物的第一人脸图像。其中,第一人脸图像和现有的人脸图像相比关键点没有变化,因此可以利用现有的任意关键点检测技术确定出该现有的人脸图像中的关键点的坐标值,即第一人脸图像中第一关键点的坐标值。可以将遮挡物添加在人脸图像中的任意位置,然后根据添加位置为第一人脸图像中的第一关键点标注遮挡信息,该遮挡信息可以包括未被遮挡(置为0)和已被遮挡(置为1)。假设第一关键点包括左侧嘴角,遮挡物为口罩,添加位置为人脸图像中眼睛下方的面部区域。这样左侧嘴角的遮挡信息则为已被遮挡。而当第一关键点为左侧眉毛端部时,显然该左侧眉毛端部的遮挡信息为未被遮挡。
S200:将所述第一人脸图像作为输入数据,将所述第一关键点的坐标值以及所述第一关键点的遮挡信息作为输出数据训练遮挡检测模型,使所述遮挡检测模型基于输入的任意第二人脸图像,输出所述第二人脸图像中包含的第二关键点的坐标值以及所述第二关键点的遮挡概率。
在获得训练样本数据的基础上,可以利用现有的任意神经网络模型基于训练样本数据进行机器学习训练,例如对于移动端可以选择mobilenet、shufflenet等模型,对于云端可以选择resnet、inception等模型,具体模型的选择可以根据应用场景的不同需求而确定,本申请对此不做限制。
将训练样本数据中的第一人脸图像输入遮挡检测模型,使遮挡检测模型预测输出第一人脸图像中包含的各个关键点的坐标值,以及该关键点是否被遮挡的概率。其中上述概率可以是[0,1]之间的任意数值,数值越大表示被遮挡的概率越大。可以将第一关键点的坐标值以及所述第一关键点的遮挡信息作为真值数据,以计算遮挡检测模型预测的各个关键点的坐标值以及关键点的遮挡概率对应的损失函数,该损失函数可以选择例如mseloss或者wingloss。可以利用现有的任意优化算法来使损失函数最小化,例如梯度下降法、自适应时刻估计方法等,以确定遮挡检测模型的遮挡参数,例如各个神经元对应的权重值。
通过上述步骤,本实施例可以获得稳定、理想的遮挡检测模型,该模型对于输入的任意人脸图像,可以自动预测关键点的坐标以及关键点的遮挡概率。
图2示出了本申请实施例一中构造训练样本数据的示意性流程图。如图2所示,步骤S100构造训练样本数据包括:
S110:获取不包含遮挡物的原始人脸图像。
原始人脸图像中可以包含多个像素点,每个像素点具有对应的像素值,例如RGB组成的颜色值。这样可以将原始人脸图像表示为矩阵的形式,矩阵中的每一个元素对应一个像 素点。
S120:获取包含遮挡物的原始遮挡物图像,从所述原始遮挡物图像中提取目标遮挡物。
为了便于计算,原始遮挡图像的大小优选与原始人脸图像的大小相同,即包含相同个数的像素点。相应地,原始遮挡图像也可以表示为矩阵的形式。
可以理解,通过摄像装置直接拍摄的原始遮挡物图像中除了目标遮挡物之外可能存在大量背景信息,本步骤可以利用现有的计算机视觉技术删除原始遮挡物图像中的背景信息,从而提取出没有干扰的单纯的目标遮挡物。
在一个示例中,可以利用图像分割技术例如Mask R-CNN、DeepLab等深度学习网络或者传统的graphcut方法对原始遮挡物图像进行分割,得到分割轮廓(Mask)。该分割轮廓用于区分原始遮挡物图像中的前井和背景,具体的,目标遮挡物所在区域作为前景,将其它区域作为背景。需要说明的是,分割轮廓只是对目标遮挡物的轮廓进行了简单的二元分割,比如将前景中的像素值设置为1(即纯白色),将背景中的像素值设置为0(即纯黑色)。
可以理解,真实的目标遮挡物的颜色未必一定是纯黑色。因此本步骤用分割轮廓和所述原始遮挡物图像进行卷积计算,就可以得到所述目标遮挡物,即得到所述目标遮挡物中每个像素对应的真实的像素值。
S130:将所述目标遮挡物和所述人脸图像进行合成,以得到所述添加了遮挡物的第一人脸图像。
图3示出了本申请实施例一中合成第一人脸图像的示意性流程图。如图3所示,步骤S130包括:
S131:从所述目标遮挡物中选取第一对象。该第一对象可以包含目标遮挡物中的全部像素点或者部分像素点。
为了增加数据的多样性,还可以对第一对象进行仿射变换,例如平移、缩放、翻转、旋转、错切等,以得到不同状态下的第一对象。
S132:从所述人脸图像中的任意位置选择第二对象,所述第二对象的大小和形状与所述第一对象相同。例如,第一对象的形状为椭圆形,包含M个像素点,那么第二对象也需为椭圆形,包含M个像素点。
需要说明的是,在第一对象进行了仿射变换的情况下,第二对象的大小和形状需要和变换后的第一对象的大小和形状相同。
S133:用所述第一对象内包含的每个像素点的像素值替换所述第二对象内相应像素点的像素值。
通常目标遮挡物的颜色与人脸的肤色不同。可以理解,将第二对象内相应像素点的像素值设置为第一对象内像素点的像素值,视觉上就达到了将第一对象遮挡在人脸图像上的效果。
假设标记了关键点的原始人脸图像用A表示,原始遮挡物图像用I表示,分割轮廓用M表示,则目标遮挡物Z=I*M。假设对Z进行仿射变换后得到Z’,相应地对M进行仿射变换后得到M’,可以理解,经过仿射变化后最终得到的第一人脸图像B可以表示为:B=A*(1-M’)+Z’。
S140:记录所述第一人脸图像中的第一关键点的坐标值,以及每个所述第一关键点的遮挡信息。
上述第一关键点的坐标值是预先已经确定的,第一关键点的遮挡信息则是根据第二对象的区域范围来确定。例如落在第二对象的区域范围内的第一关键点的遮挡信息是已被遮挡(置为1),落在第二对象的区域范围外的第一关键点的遮挡信息是未被遮挡(置为0)。
上述步骤可以得到添加了遮挡物的第一人脸图像,有利于提供丰富的训练样本数据,从而提高遮挡检测模型的准确性。
图4示出了本申请实施例一中合成第一人脸图像的另一个示意性流程图。如图4所示,步骤S130包括:
S131’:获取标注了关键点的不包含遮挡物的第三人脸图像,以及未标注关键点的包含遮挡物的第四人脸图像。
本实施例中的第三人脸图像例如可以是正面免冠照片,并且已事先标注了各个关键点。本实施例中的第四人脸图像中包含较多遮挡物,例如佩戴帽子、口罩、眼镜等,第四人脸图像无需事先标注关键点。
S132’:利用第一编码器提取所述第三人脸图像中的关键点特征。
第一编码器可以是由任意神经网络组成的,例如卷积神经网络。由于第三人脸图像中的关键点已经事先标注,因此第一编码器提取的关键点特征具有较高的准确度。
S133’:利用第二编码器提取所述第四人脸图像中的表观特征;其中所述表观特征中包含遮挡物特征。
表观特征指的是人脸图像中除关键点特征之外的其它特征,例如外貌特征、配饰特征、遮挡特征等。
S134’:将所述关键点特征和所述表观特征输入解码器,利用所述解码器生成所述第一人脸图像。
图5示出了本申请实施例一中通过编码器和解码器合成第一人脸图像的网络示意图。 如图5所示,Eg表示第一编码器,用于提取关键点特征;Ea表示第二编码器,用于提取表观特征;D表示解码器,用于将Eg提取的关键点特征和Ea提取的表观特征进行合成,最终生成第一人脸图像。
本实施例中利用解码器将提取到的关键点特征和表观特征重新恢复为第一人脸图像,可以理解,该第一人脸图像中的关键点特征来自第三人脸图像,第一人脸图像中的表观特征来自第四人脸图像,其中第一人脸图像中的表观特征中包含较多的遮挡特征。这样就构造出了添加了遮挡物的第一人脸图像。
需要说明的是,通过解码器编码器合成的第一人脸图像中,人脸关键点的位置坐标是已知的,但是遮挡物的坐标尚且未知。这种情况下,需要进一步通过现有的人脸五官分割算法,将图片中的人脸部位或五官部位识别出来,那么人脸图像中未被识别出来的部位就是遮挡物所在的位置。在确定了遮挡物位置的基础上,可以进一步确定人脸关键点哪些已被遮挡以及哪些未被遮挡。
图6示出了本申请实施例一中训练第一编码器、第二编码器和解码器的示意性流程图。如图6所述,本实施例中的第一编码器、第二编码器和解码器组成的神经网络通过以下步骤训练得到:
S610:利用所述第一编码器提取所述第三人脸图像中的目标关键点特征,该步骤用于训练第一编码器提取关键点特征的能力。
S620:利用所述第二编码器提取所述第三人脸图像中的目标表观特征,该步骤用于训练第二编码器提取表观特征的能力。
S630:将所述目标关键点特征和所述目标表观特征输入所述解码器,利用所述解码器生成目标人脸图像。该步骤用于训练解码器基于关键点特征和表观特征合成图像的能力。
S640:将所述第三人脸图像作为真值数据,确定所述目标人脸图像与所述真值数据之间的损失函数。
具体的,本实施例中的损失函数L可以通过下式表示:
Figure PCTCN2021112308-appb-000002
上式中x代表第三人脸图像,y代表第三人脸图像中的关键点特征,z代表第三人脸图像中的表观特征,G(Ea+Eg)代表解码器生成的目标人脸图像,q代表预测数据的分布概率,p代表真值数据的分布概率,KL代表散度函数。
S650:基于所述损失函数对所述第一编码器、所述第二编码器和所述解码器进行反向训练。
通过上述过程,有利于提高第一编码器和第二编码器提取特征的准确度以及解码器还 原图像的准确度,从而合成大量添加了遮挡物的、具有确定关键点的第一人脸图像。
图7示出了本申请实施例一中训练遮挡检测模型的示意性流程图。如图7所示,步骤S200包括:
S210:对第一神经网络进行训练,使所述第一神经网络基于输入的所述第一人脸图像,输出预测关键点的坐标值。
本实施例中的第一神经网络可以是现有的任意神经网络模型,例如对于移动端可以选择mobilenet、shufflenet等模型,对于云端可以选择resnet、inception等模型,本申请对此不做限制。在前文中已获得第一人脸图像以及第一关键点的坐标值的基础上,用第一人脸图像作为输入数据,用第一关键点的坐标值作为真值数据训练第一神经网络,使其输出与第一关键点的坐标值较为接近的预测关键点的坐标值。
S220:选择第一神经网络中隐含层的输出,将隐含层的输出作为输入,对第二神经网络进行训练,输出所述预测关键点的遮挡概率。
可以理解,任意神经网络中均包含输入层、输出层和隐含层,其中隐含层的具体层数根据实际需要可以设置为一层或者多层。本步骤选择其中一层隐含层的输出数据,将该隐含层的输出数据作为第二神经网络的输入层,将前文中已获得的第一关键点的遮挡信息作为真值数据训练第二神经网络,使其输出与第一关键点的遮挡信息较为接近的预测关键点的遮挡概率。其中遮挡概率可以是[0,1]内的任意数值,数值越大,表明对应预测关键点被遮挡的可能性越大。
同样,第二神经网络可以是现有的任意神经网络模型,例如对于移动端可以选择mobilenet、shufflenet等模型,对于云端可以选择resnet、inception等模型,本申请对此不做限制。
图8示出了本申请实施例一中第一神经网络和第二神经网络的结构示意图。
如图8所示,第一神经网络包括第一输入层、第一隐含层和第一输出层,第二神经网络包括第二输入层、第二隐含层和第二输出层。在训练阶段,第一输入层用于接收输入的第一人脸图像,第一输出层用于输出第一人脸图像中包含的预测关键点的坐标值。第一隐含层中可以具体包含一层或者多层,将其中一层隐含层的输出作为第二隐含层的输入,从而通过第二隐含层输出预测关键点的遮挡概率。
通过上述结构,本实施例利用两个神经网络构造遮挡检测模型,可以根据一组输入数据得到两组不同的输出结果,从而可以同时预测人脸图像中的关键点坐标以及关键点的遮挡概率。
S230:根据所述预测关键点的坐标和所述第一关键点的坐标值确定所述第一神经网络 的第一损失函数,根据所述预测关键点的遮挡概率和所述第一关键点的遮挡信息确定所述第二神经网络的第二损失函数。
损失函数表征预测数据和真值数据之间的差距。本实施例利用第一神经网络和第二神经网络构造遮挡检测模型,对应地产生第一损失函数和第二损失函数。
S240:根据所述第一损失函数和所述第二损失函数确定所述遮挡检测模型的综合损失函数。
可以理解,本实施例中遮挡检测模型的综合损失函数是由第一损失函数和第二损失函数共同构成的。在一个示例中,综合损失函数loss通过下式确定:
Figure PCTCN2021112308-appb-000003
其中,p i代表第i个预测关键点的遮挡概率,l i代表第一神经网络的第一损失函数,o i代表第二神经网络的第二损失函数,λ 1和λ 2分别代表经验参数。通过确定适当的综合损失函数,有利于提高遮挡检测模型的预测准确率。
S250:基于所述综合损失函数进行反向训练,以确定所述模型中的遮挡参数。
本实施例中,可以通过反向传播算法或梯度下降算法等优化算法对遮挡检测模型进行反向训练,调整遮挡检测模型中的遮挡参数,以使该模型在训练数据集上的综合损失函数达到较小值。上述遮挡参数可以是遮挡检测模型中每个神经元对应的权重值。
通过上述步骤,本实施例可以得到较为理想的遮挡检测模型,该模型能够根据输入的任意人脸图像,准确输出预测关键点的坐标值以及预测关键点的遮挡概率。
请继续参阅图9,示出了一种遮挡检测模型的训练装置,在本实施例中,训练装置90可以包括或被分割成一个或多个程序模块,一个或者多个程序模块被存储于存储介质中,并由一个或多个处理器所执行,以完成本申请,并可实现上述遮挡检测模型的训练方法。本申请所称的程序模块是指能够完成特定功能的一系列计算机可读指令的指令段,比程序本身更适合于描述遮挡检测模型的训练装置90在存储介质中的执行过程。以下描述将具体介绍本实施例各程序模块的功能:
样本数据构造模块91,适用于构造训练样本数据,所述训练样本数据包括添加了遮挡物的第一人脸图像、所述第一人脸图像中第一关键点的坐标值以及所述第一关键点的遮挡信息;
模型训练模块92,适用于将所述第一人脸图像作为输入数据,将所述第一关键点的坐标值以及所述第一关键点的遮挡信息作为输出数据训练遮挡检测模型,使所述遮挡检测模型基于输入的任意第二人脸图像,输出所述第二人脸图像中包含的第二关键点的坐标值以 及所述第二关键点的遮挡概率。
本实施例还提供一种计算机设备,如可以执行程序的智能手机、平板电脑、笔记本电脑、台式计算机、机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。本实施例的计算机设备100至少包括但不限于:可通过***总线相互通信连接的存储器101、处理器102,如图10所示。需要指出的是,图10仅示出了具有组件101-102的计算机设备100,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。
本实施例中,存储器101(即可读存储介质)包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,存储器101可以是计算机设备100的内部存储单元,例如该计算机设备100的硬盘或内存。在另一些实施例中,存储器101也可以是计算机设备100的外部存储设备,例如该计算机设备100上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,存储器101还可以既包括计算机设备100的内部存储单元也包括其外部存储设备。本实施例中,存储器101通常用于存储安装于计算机设备100的操作***和各类应用软件,例如实施例一的遮挡检测模型的训练装置90的程序代码等。此外,存储器101还可以用于暂时地存储已经输出或者将要输出的各类数据。
处理器102在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器102通常用于控制计算机设备100的总体操作。本实施例中,处理器102用于运行存储器101中存储的程序代码或者处理数据,例如运行遮挡检测模型的训练装置90,以实现实施例一的遮挡检测模型的训练方法。
本实施例还提供一种计算机可读存储介质,该计算机可读存储介质可以是易失性的存储介质,也可以是非易失性的存储介质,如闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘、服务器、App应用商城等等,其上存储有计算机可读指令,程序被处理器执行时实现相应功能。本实施例的计算机可读存储介质用于存储遮挡检测模型的训练装置80,被处理器执行时实现实施例一的遮挡检测模型的训练方法。
实施例二
请参阅图11,本实施例提出一种人脸图像的美化处理方法,包括以下步骤:
S100’:获取待处理的第五人脸图像,例如用户拍摄的照片等。
S200’:将所述第五人脸图像输入上述的遮挡检测模型,输出所述第五人脸图像中的第五关键点的坐标值以及所述第五关键点的遮挡概率。例如输出用户拍摄的照片中眼睛、鼻子、嘴唇、下颚等多个关键点的坐标位置,同时输出这些关键点的坐标位置处存在遮挡物的概率,例如眼睛是否被遮挡,嘴巴是否被遮挡等。
S300’:根据所述遮挡概率对所述人脸图像进行美化处理。例如,该美化处理只应用在确定不存在遮挡的关键点位置,对于确定存在遮挡的关键点位置则不予处理。例如当检测到用户的嘴唇被口罩遮挡时,将不对嘴唇位置进行颜色增强;或者当检测到用户的眼睛位置被墨镜遮挡时,将不对眼睛位置进行轮廓描绘等。
通过上述方法,可以使得对人脸图像的美化处理更加符合真实场景,从而提升用户体验。
请继续参阅图12,示出了一种人脸图像的美化处理装置120,包括:
图像获取模块121,适用于获取待处理的第三人脸图像;
遮挡检测模块122,适用于将所述第三人脸图像输入上述的遮挡检测模型,输出所述第三人脸图像中的第三关键点的坐标值以及所述第三关键点的遮挡概率;
美化模块123,适用于根据所述遮挡概率对所述人脸图像进行美化处理。
上述本申请实施例序号仅仅为了描述,不代表实施例的优劣。
流程图中或在此以其它方式描述的任何过程或方法描述可以被理解为,表示包括一个或更多个用于实现特定逻辑功能或过程的步骤的可执行指令的代码的模块、片段或部分,并且本申请的优选实施方式的范围包括另外的实现,其中可以不按所示出或讨论的顺序,包括根据所涉及的功能按基本同时的方式或按相反的顺序,来执行功能,这应被本申请的实施例所属技术领域的技术人员所理解。
本技术领域的普通技术人员可以理解,实现上述实施例方法携带的全部或部分步骤是可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读介质中,该程序在执行时,包括方法实施例的步骤之一或其组合。
在本说明书的描述中,参考术语“一个实施例”、“一些实施例”、“示例”、“具体示例”或“一些示例”等的描述意指结合该实施例或示例描述的具体特征、结构、材料或者特点包 含于本申请的至少一个实施例或示例中。在本说明书中,对上述术语的示意性表述不一定指的是相同的实施例或示例。而且,描述的具体特征、结构、材料或者特点可以在任何的一个或多个实施例或示例中以合适的方式结合。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (14)

  1. 一种遮挡检测模型的训练方法,包括以下步骤:
    构造多个训练样本数据,所述训练样本数据包括添加了遮挡物的第一人脸图像、所述第一人脸图像中第一关键点的坐标值以及所述第一关键点的遮挡信息;
    将所述第一人脸图像作为输入数据,将所述第一关键点的坐标值以及所述第一关键点的遮挡信息作为输出数据训练遮挡检测模型,使所述遮挡检测模型基于输入的任意第二人脸图像,输出所述第二人脸图像中包含的第二关键点的坐标值以及所述第二关键点的遮挡概率。
  2. 根据权利要求1的训练方法,所述构造多个训练样本数据的步骤包括:
    获取不包含遮挡物的原始人脸图像;
    获取包含遮挡物的原始遮挡物图像,从所述原始遮挡物图像中提取目标遮挡物;
    将所述目标遮挡物和所述人脸图像进行合成,以得到所述添加了遮挡物的第一人脸图像;
    记录所述第一人脸图像中的第一关键点的坐标值,以及每个所述第一关键点的遮挡信息。
  3. 根据权利要求2所述的训练方法,所述获取包含遮挡物的原始遮挡物图像,从所述原始遮挡物图像中提取目标遮挡物的步骤包括:
    基于图像分割技术获取目标遮挡物的分割轮廓;
    用所述分割轮廓和所述原始遮挡物图像进行卷积计算,以得到所述目标遮挡物。
  4. 根据权利要求2或3所述的训练方法,所述将所述目标遮挡物和所述人脸图像进行合成,以得到所述添加了遮挡物的第一人脸图像的步骤包括:
    从所述目标遮挡物中选取第一对象;
    从所述人脸图像中的任意位置选择第二对象,所述第二对象的大小和形状与所述第一对象相同;
    用所述第一对象内包含的每个像素点的像素值替换所述第二对象内相应像素点的像素值。
  5. 根据权利要求2或3所述的训练方法,所述将所述目标遮挡物和所述人脸图像进行合成,以得到所述添加了遮挡物的第一人脸图像的步骤包括:
    从所述遮挡物图像中选取目标对象,对所述目标对象随机变换以得到第一对象;
    从所述人脸图像中的任意位置选择第二对象,所述第二对象的大小和形状与所述第一 对象相同;
    用所述第一对象内包含的每个像素点的像素值替换所述第二对象内相应像素点的像素值。
  6. 根据权利要求1的训练方法,所述构造多个训练样本数据的步骤包括:
    获取标注了关键点的不包含遮挡物的第三人脸图像,以及未标注关键点的包含遮挡物的第四人脸图像;
    利用第一编码器提取所述第三人脸图像中的关键点特征;
    利用第二编码器提取所述第四人脸图像中的表观特征,其中所述表观特征中包含遮挡物特征;
    将所述关键点特征和所述表观特征输入解码器,利用所述解码器生成所述第一人脸图像。
  7. 根据权利要求6所述的训练方法,所述第一编码器、所述第二编码器和所述解码器通过以下步骤训练得到:
    利用所述第一编码器提取所述第三人脸图像中的目标关键点特征;
    利用所述第二编码器提取所述第三人脸图像中的目标表观特征;
    将所述目标关键点特征和所述目标表观特征输入所述解码器,利用所述解码器生成目标人脸图像;
    将所述第三人脸图像作为真值数据,确定所述目标人脸图像与所述真值数据之间的损失函数;
    基于所述损失函数对所述第一编码器、所述第二编码器和所述解码器进行反向训练。
  8. 根据权利要求1、2、3、6、7中任一项所述的训练方法,所述将所述第一人脸图像作为输入数据,将所述第一关键点的坐标值以及所述第一关键点的遮挡信息作为输出数据训练遮挡检测模型的步骤包括:
    对第一神经网络进行训练,使所述第一神经网络基于输入的所述第一人脸图像,输出预测关键点的坐标值;
    选择第一神经网络中隐含层的输出,将隐含层的输出作为输入,对第二神经网络进行训练,输出所述预测关键点的遮挡概率;
    根据所述预测关键点的坐标和所述第一关键点的坐标值确定所述第一神经网络的第一损失函数,根据所述预测关键点的遮挡概率和所述第一关键点的遮挡信息确定所述第二神经网络的第二损失函数;
    根据所述第一损失函数和所述第二损失函数确定所述遮挡检测模型的综合损失函数;
    基于所述综合损失函数进行反向训练,以确定所述模型中的遮挡参数。
  9. 根据权利要求8所述的训练方法,所述综合损失函数的表达式为:
    Figure PCTCN2021112308-appb-100001
    其中,p i代表第i个预测关键点的遮挡概率,l i代表第一神经网络的第一损失函数,o i代表第二神经网络的第二损失函数,λ 1和λ 2分别代表经验参数。
  10. 一种人脸图像的美化处理方法,包括:
    获取待处理的第五人脸图像;
    将所述第三人脸图像输入权利要求1-9中任一项所述的遮挡检测模型,输出所述第五人脸图像中的第五关键点的坐标值以及所述第五关键点的遮挡概率;
    根据所述遮挡概率对所述人脸图像进行美化处理。
  11. 一种遮挡检测模型的训练装置,包括:
    样本数据构造模块,适用于构造多个训练样本数据,所述训练样本数据包括添加了遮挡物的第一人脸图像、所述第一人脸图像中第一关键点的坐标值以及所述第一关键点的遮挡信息;
    模型训练模块,适用于将所述第一人脸图像作为输入数据,将所述第一关键点的坐标值以及所述第一关键点的遮挡信息作为输出数据训练遮挡检测模型,使所述遮挡检测模型基于输入的任意第二人脸图像,输出所述第二人脸图像中包含的第二关键点的坐标值以及所述第二关键点的遮挡概率。
  12. 一种人脸图像的美化处理装置,包括:
    图像获取模块,适用于获取待处理的第三人脸图像;
    遮挡检测模块,适用于将所述第三人脸图像输入权利要求1-9中任一项所述的遮挡检测模型,输出所述第三人脸图像中的第三关键点的坐标值以及所述第三关键点的遮挡概率;
    美化模块,适用于根据所述遮挡概率对所述人脸图像进行美化处理。
  13. 一种计算机设备,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现权利要求1至9任一项所述方法的步骤。
  14. 一种计算机可读存储介质,其上存储有计算机可读指令,所述计算机可读指令被处理器执行时实现权利要求1至9任一项所述方法的步骤。
PCT/CN2021/112308 2020-10-16 2021-08-12 遮挡检测模型的训练方法及人脸图像的美化处理方法 WO2022078041A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP21879088.9A EP4207053A4 (en) 2020-10-16 2021-08-12 TRAINING METHOD FOR OCCLUSION DETECTION MODEL AND METHOD FOR BEAUTYING FACIAL IMAGE

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011111254.2A CN112419170B (zh) 2020-10-16 2020-10-16 遮挡检测模型的训练方法及人脸图像的美化处理方法
CN202011111254.2 2020-10-16

Publications (1)

Publication Number Publication Date
WO2022078041A1 true WO2022078041A1 (zh) 2022-04-21

Family

ID=74840052

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/112308 WO2022078041A1 (zh) 2020-10-16 2021-08-12 遮挡检测模型的训练方法及人脸图像的美化处理方法

Country Status (4)

Country Link
US (1) US20230237841A1 (zh)
EP (1) EP4207053A4 (zh)
CN (1) CN112419170B (zh)
WO (1) WO2022078041A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114897722A (zh) * 2022-04-29 2022-08-12 中国科学院西安光学精密机械研究所 一种自编码网络及基于自编码网络的波前图像复原方法
CN115376196A (zh) * 2022-10-25 2022-11-22 上海联息生物科技有限公司 图像处理方法、金融隐私数据的安全处理方法及装置
CN117275075A (zh) * 2023-11-01 2023-12-22 浙江同花顺智能科技有限公司 一种人脸遮挡检测方法、***、装置和存储介质
WO2024082950A1 (zh) * 2022-10-20 2024-04-25 广州市百果园信息技术有限公司 基于遮挡分割的三维人脸重建方法及***

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112419170B (zh) * 2020-10-16 2023-09-22 上海哔哩哔哩科技有限公司 遮挡检测模型的训练方法及人脸图像的美化处理方法
CN113284041B (zh) * 2021-05-14 2023-04-18 北京市商汤科技开发有限公司 一种图像处理方法、装置、设备及计算机存储介质
CN113033524B (zh) * 2021-05-26 2021-08-17 北京的卢深视科技有限公司 遮挡预测模型训练方法、装置、电子设备及存储介质
CN113256656A (zh) * 2021-05-28 2021-08-13 北京达佳互联信息技术有限公司 图像分割方法和装置
CN113705466B (zh) * 2021-08-30 2024-02-09 浙江中正智能科技有限公司 用于遮挡场景、尤其高仿遮挡下的人脸五官遮挡检测方法
CN113762136A (zh) * 2021-09-02 2021-12-07 北京格灵深瞳信息技术股份有限公司 人脸图像遮挡判断方法、装置、电子设备和存储介质
CN113723368B (zh) * 2021-10-29 2022-07-12 杭州魔点科技有限公司 多场景兼容的人脸识别方法、装置、电子设备和存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095856A (zh) * 2015-06-26 2015-11-25 上海交通大学 基于掩膜的有遮挡人脸识别方法
US20190371080A1 (en) * 2018-06-05 2019-12-05 Cristian SMINCHISESCU Image processing method, system and device
CN110826519A (zh) * 2019-11-14 2020-02-21 深圳市华付信息技术有限公司 人脸遮挡检测方法、装置、计算机设备及存储介质
CN111027504A (zh) * 2019-12-18 2020-04-17 上海眼控科技股份有限公司 人脸关键点检测方法、装置、设备及存储介质
CN112419170A (zh) * 2020-10-16 2021-02-26 上海哔哩哔哩科技有限公司 遮挡检测模型的训练方法及人脸图像的美化处理方法

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107679490B (zh) * 2017-09-29 2019-06-28 百度在线网络技术(北京)有限公司 用于检测图像质量的方法和装置
CN109960974A (zh) * 2017-12-22 2019-07-02 北京市商汤科技开发有限公司 人脸关键点检测方法、装置、电子设备及存储介质
CN108932693B (zh) * 2018-06-15 2020-09-22 中国科学院自动化研究所 基于人脸几何信息的人脸编辑补全方法及装置
CN110728330A (zh) * 2019-10-23 2020-01-24 腾讯科技(深圳)有限公司 基于人工智能的对象识别方法、装置、设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105095856A (zh) * 2015-06-26 2015-11-25 上海交通大学 基于掩膜的有遮挡人脸识别方法
US20190371080A1 (en) * 2018-06-05 2019-12-05 Cristian SMINCHISESCU Image processing method, system and device
CN110826519A (zh) * 2019-11-14 2020-02-21 深圳市华付信息技术有限公司 人脸遮挡检测方法、装置、计算机设备及存储介质
CN111027504A (zh) * 2019-12-18 2020-04-17 上海眼控科技股份有限公司 人脸关键点检测方法、装置、设备及存储介质
CN112419170A (zh) * 2020-10-16 2021-02-26 上海哔哩哔哩科技有限公司 遮挡检测模型的训练方法及人脸图像的美化处理方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4207053A4

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114897722A (zh) * 2022-04-29 2022-08-12 中国科学院西安光学精密机械研究所 一种自编码网络及基于自编码网络的波前图像复原方法
CN114897722B (zh) * 2022-04-29 2023-04-18 中国科学院西安光学精密机械研究所 一种基于自编码网络的波前图像复原方法
WO2024082950A1 (zh) * 2022-10-20 2024-04-25 广州市百果园信息技术有限公司 基于遮挡分割的三维人脸重建方法及***
CN115376196A (zh) * 2022-10-25 2022-11-22 上海联息生物科技有限公司 图像处理方法、金融隐私数据的安全处理方法及装置
CN117275075A (zh) * 2023-11-01 2023-12-22 浙江同花顺智能科技有限公司 一种人脸遮挡检测方法、***、装置和存储介质
CN117275075B (zh) * 2023-11-01 2024-02-13 浙江同花顺智能科技有限公司 一种人脸遮挡检测方法、***、装置和存储介质

Also Published As

Publication number Publication date
CN112419170B (zh) 2023-09-22
EP4207053A4 (en) 2024-02-28
CN112419170A (zh) 2021-02-26
US20230237841A1 (en) 2023-07-27
EP4207053A1 (en) 2023-07-05

Similar Documents

Publication Publication Date Title
WO2022078041A1 (zh) 遮挡检测模型的训练方法及人脸图像的美化处理方法
JP6636154B2 (ja) 顔画像処理方法および装置、ならびに記憶媒体
WO2018188453A1 (zh) 人脸区域的确定方法、存储介质、计算机设备
US20210182537A1 (en) Method and apparatus for detecting facial key points, computer device, and storage medium
WO2022156640A1 (zh) 一种图像的视线矫正方法、装置、电子设备、计算机可读存储介质及计算机程序产品
CN108428214B (zh) 一种图像处理方法及装置
WO2022179401A1 (zh) 图像处理方法、装置、计算机设备、存储介质和程序产品
CN109271930B (zh) 微表情识别方法、装置与存储介质
US8903139B2 (en) Method of reconstructing three-dimensional facial shape
JP2018022360A (ja) 画像解析装置、画像解析方法およびプログラム
EP3910507A1 (en) Method and apparatus for waking up screen
US20200285859A1 (en) Video summary generation method and apparatus, electronic device, and computer storage medium
CA3137297C (en) Adaptive convolutions in neural networks
CN107944381B (zh) 人脸跟踪方法、装置、终端及存储介质
WO2023035531A1 (zh) 文本图像超分辨率重建方法及其相关设备
WO2024109374A1 (zh) 换脸模型的训练方法、装置、设备、存储介质和程序产品
CN111383232A (zh) 抠图方法、装置、终端设备及计算机可读存储介质
US20230100427A1 (en) Face image processing method, face image processing model training method, apparatus, device, storage medium, and program product
CN113822965A (zh) 图像渲染处理方法、装置和设备及计算机存储介质
CN113658324A (zh) 图像处理方法及相关设备、迁移网络训练方法及相关设备
KR102639187B1 (ko) 얼굴 합성 서비스를 제공하는 방법 및 이를 위한 장치
CN111028318A (zh) 一种虚拟人脸合成方法、***、装置和存储介质
WO2024104144A1 (zh) 图像合成方法和装置、存储介质及电子设备
WO2024041108A1 (zh) 图像矫正模型训练及图像矫正方法、装置和计算机设备
CN110163049B (zh) 一种人脸属性预测方法、装置及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21879088

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2021879088

Country of ref document: EP

Effective date: 20230328

NENP Non-entry into the national phase

Ref country code: DE