WO2022068451A1 - Style image generation method and apparatus, model training method and apparatus, device, and medium - Google Patents

Style image generation method and apparatus, model training method and apparatus, device, and medium Download PDF

Info

Publication number
WO2022068451A1
WO2022068451A1 PCT/CN2021/113225 CN2021113225W WO2022068451A1 WO 2022068451 A1 WO2022068451 A1 WO 2022068451A1 CN 2021113225 W CN2021113225 W CN 2021113225W WO 2022068451 A1 WO2022068451 A1 WO 2022068451A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
style
face
target
generation model
Prior art date
Application number
PCT/CN2021/113225
Other languages
French (fr)
Chinese (zh)
Inventor
尹淳骥
胡兴鸿
Original Assignee
北京字节跳动网络技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京字节跳动网络技术有限公司 filed Critical 北京字节跳动网络技术有限公司
Publication of WO2022068451A1 publication Critical patent/WO2022068451A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/02Affine transformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4023Scaling of whole images or parts thereof, e.g. expanding or contracting based on decimating pixels or lines of pixels; based on inserting pixels or lines of pixels
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging

Definitions

  • the present disclosure relates to the technical field of image processing, and in particular, to a style image generation method, a model training method, an apparatus, a device and a medium.
  • Image style conversion refers to the style conversion of one or more images to generate a style image that meets user needs.
  • Training a model with the function of generating style images is currently the main way to realize image style transfer.
  • the training method of the model in the existing scheme is single, which cannot meet the needs of users to generate style images in real time.
  • the embodiments of the present disclosure provide a style image generation method, a model training method, an apparatus, a device and a medium.
  • an embodiment of the present disclosure provides a method for generating a style image, including:
  • the target style image real-time generation model is a real-time generation model of at least one style image obtained by performing at least one cropping operation based on the initial style image generation model according to at least one set of cropping parameters after training to obtain the initial style image generation model.
  • obtained by training and the initial style image generation model and the target style image real-time generation model are both obtained by training based on multiple original face sample images and target style face sample images corresponding to each original face sample image, wherein , the real-time generation model of the style image changes with the change of the cropping parameter.
  • an embodiment of the present disclosure also provides a method for training a style image generation model, including:
  • an initial style image generation model is obtained by training
  • the at least one style image real-time generation model is trained to obtain a trained target style image Generate models in real time.
  • an embodiment of the present disclosure further provides an apparatus for generating a style image, including:
  • the original face image acquisition module is used to obtain the original face image
  • a target-style face image generation module used to generate a model in real time by using the pre-trained target-style image to obtain a target-style face image corresponding to the original face image;
  • the target style image real-time generation model is a real-time generation model of at least one style image obtained by performing at least one cropping operation based on the initial style image generation model according to at least one set of cropping parameters after training to obtain the initial style image generation model.
  • obtained by training and the initial style image generation model and the target style image real-time generation model are both obtained by training based on multiple original face sample images and target style face sample images corresponding to each original face sample image, wherein , the real-time generation model of the style image changes with the change of the cropping parameter.
  • an embodiment of the present disclosure further provides a training device for a style image generation model, including:
  • a sample acquisition module used for acquiring a plurality of original face sample images and a target style face sample image corresponding to each original face sample image
  • a first training module used for training to obtain an initial style image generation model based on the plurality of original face sample images and the target style face sample images corresponding to each original face sample image;
  • a model cropping module configured to perform at least one cropping operation according to at least one set of cropping parameters based on the initial style image generation model to obtain at least one style image real-time generation model, wherein the style image real-time generation model follows the cropping changes in parameters;
  • a second training module configured to train the at least one style image real-time generation model based on the plurality of original face sample images and the target style face sample image corresponding to each original face sample image, Obtain the trained target style image to generate the model in real time.
  • an embodiment of the present disclosure further provides an electronic device, the electronic device includes: a processing device; a memory for storing executable instructions of the processing device; the processing device for obtaining an instruction from the memory
  • the executable instructions are read and executed to implement any style image generation method provided by the embodiments of the present disclosure, or to implement any style image generation model training methods provided by the embodiments of the present disclosure.
  • an embodiment of the present disclosure further provides a computer-readable storage medium, where the storage medium stores a computer program, and when the computer program is executed by a processing device, implements any style image generation method provided by the embodiment of the present disclosure , or implement the training method for any style image generation model provided by the embodiments of the present disclosure.
  • the initial style image generation model After training the initial style image generation model based on the multiple original face sample images and the target style face sample image corresponding to each original face sample image, the initial style image generation model is cropped based on at least one set of cropping parameters , and continue to train the cropped initial style image generation model to obtain a style image real-time generation model.
  • the space occupation and computational complexity of the style image real-time generation model are smaller than the initial style image generation model, and it has the ability to generate style images in real time. Therefore, in the application stage of the style image real-time generation model, the style image that meets the user's needs can be generated in real time by using the style image real-time generation model on the user's terminal device.
  • the style image real-time generation model can be changed with the change of the cropping parameters, that is, the user can train the style image real-time generation model in different ways.
  • the problems that the existing model has a single training method and cannot meet the needs of users to generate style images in real time on the user terminal device are solved, the effect of generating style images for users in real time is realized, and the user's understanding of the image style conversion function is improved.
  • different real-time generation models of style images can be compatible with terminal devices with different performances, so that the style image generation method in the embodiment of the present disclosure can be widely applied to terminal devices with different performances.
  • FIG. 1 is a flowchart of a method for generating a style image according to an embodiment of the present disclosure
  • FIG. 2 is a flowchart of another style image generation method provided by an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of the position of a face area bounding box on a first original face image according to an embodiment of the present disclosure
  • FIG. 4 is a flowchart of another style image generation method provided by an embodiment of the present disclosure.
  • FIG. 5 is a flowchart of another style image generation method provided by an embodiment of the present disclosure.
  • FIG. 6 is a flowchart of a method for training a style image generation model according to an embodiment of the present disclosure
  • FIG. 7 is a flowchart of another method for training a style image generation model according to an embodiment of the present disclosure.
  • FIG. 8 is a flowchart of another method for training a style image generation model according to an embodiment of the present disclosure.
  • FIG. 9 is a flowchart of another method for training a style image generation model according to an embodiment of the present disclosure.
  • FIG. 10 is a schematic diagram of a mouth material provided by an embodiment of the present disclosure.
  • FIG. 11 is a schematic structural diagram of a style image generating apparatus according to an embodiment of the present disclosure.
  • FIG. 12 is a schematic structural diagram of a training device for a style image generation model according to an embodiment of the present disclosure
  • FIG. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
  • FIG. 1 is a flowchart of a method for generating a style image provided by an embodiment of the present disclosure.
  • the embodiment of the present disclosure can be applied to a situation in which a style image of any style is generated based on an original face image.
  • the styles mentioned in the embodiments of the present disclosure may refer to image effects, such as Japanese comic style, European and American comic style, oil painting style, sketch style, or cartoon style, etc., which may be determined according to the classification of image styles in the field of image processing.
  • the original face image may refer to any image including a face region.
  • the original face image may be an image captured by a device with a capturing function, or an image drawn by a drawing technology.
  • the style image generation method provided by the embodiments of the present disclosure may be executed by a style image generation apparatus, which may be implemented by software and/or hardware, and may be integrated on any electronic device with computing capabilities, such as a terminal, etc.
  • the terminal may include But not limited to smart mobile terminals, tablet computers, personal computers, etc.
  • the style image generating device can be implemented in the form of an independent application program or a small program integrated on the public platform, and can also be implemented as an application program with a style image generating function or a functional module integrated in the small program.
  • the programs may include, but are not limited to, video interactive applications or video interactive applets.
  • the style image generation method provided by the embodiment of the present disclosure may include:
  • an image stored in the terminal may be uploaded or an image or video may be captured in real time by an image capturing device of the terminal.
  • the terminal may acquire the original face image to be processed according to the user's image selection operation, image capture operation or image upload operation in the terminal.
  • the target style image real-time generation model is obtained by training at least one style image real-time generation model obtained by performing at least one cropping operation based on the initial style image generation model according to at least one set of cropping parameters after training to obtain the initial style image generation model.
  • the initial style image generation model and the target style image real-time generation model are both trained based on multiple original face sample images and target style face sample images corresponding to each original face sample image.
  • the real-time generation model of the style image changes with the change of the cropping parameter.
  • the first cropping parameters in the initial style image generation model may be obtained, and based on the first crop parameters, at least one cropping operation is performed based on the initial style image generation model.
  • the first cropping parameter as the first important factor of the activation layer as an example
  • the first important factor of the activation layer in the initial style image generation model can be obtained, and the first important factor in the initial style image generation model can be determined according to the first important factor.
  • the activation layer and the convolution layer corresponding to the activation layer are cropped to obtain at least one style image real-time generation model, and then continue to train the at least one style image real-time generation model to obtain the trained target style image real-time generation model.
  • a real-time generation model of style images is obtained.
  • at least two cropping operations are performed on the initial style image generation model to obtain the first style image real-time generation model and the second style image real-time generation model.
  • an initial style image generation model is obtained by training based on multiple original face sample images and a target style face sample image corresponding to each original face sample image, and the first important factor of the activation layer in the initial style image generation model is obtained.
  • the activation layer in the initial style image generation model and the convolution layer corresponding to the activation layer are cropped to obtain the real-time generation model of the first style image; target-style face sample images corresponding to the original face sample images, train the real-time generation model for the first style image, obtain the trained real-time generation model for the first style image, and obtain the trained real-time generation model for the first style image.
  • the second important factor of the middle activation layer based on the second important factor, the activation layer of the trained first style image real-time generation model and the convolution layer corresponding to the activation layer are cropped to obtain the second style image real-time generation model.
  • the second style image real-time generation model is trained, and the trained second style image real-time generation model is obtained.
  • Both the trained first style image real-time generation model and the trained second style image real-time generation model can be used as target style image real-time generation models, and have the function of real-time generation of style images.
  • At least two cropping operations are performed based on the initial style image generation model to correspondingly obtain at least two style image real-time generation models, and at least two targets are obtained by training the at least two style image real-time generation models.
  • the style image real-time generation model, and at least two target style image real-time generation models correspond to different equipment performance information respectively; correspondingly, using the pre-trained style image to generate the model in real time, the target style person corresponding to the original face image is obtained.
  • the method further includes: based on the current device performance information, acquiring a real-time generation model of the target style image adapted to the current device performance information.
  • the server after the server receives the model acquisition request or the model issuing request from the terminal device, it can match the current device performance information according to the current device performance information of the terminal device carried in the model acquisition request or the model issuing request.
  • the matched target style image is generated in real time and the model is sent to the terminal device.
  • the current device performance information of the terminal device may include, but is not limited to, storage space usage information of the terminal device, processor running indicators, and other information that can be used to measure the current running performance of the terminal device.
  • the initial style image generation model can be sent to the terminal device; otherwise, the target style can be sent to the terminal device. Images generate models in real time.
  • the initial style image generation model or the target style image real-time generation model may include a conditional generative adversarial network (CGAN, Conditional Generative Adversarial Networks) model, a cycle-consistent generative adversarial network (Cyclegan, Cycle Consistent Adversarial Networks) model and other arbitrary support non-
  • CGAN conditional generative adversarial network
  • Cyclegan Cycle Consistent Adversarial Networks
  • the network model for alignment training is not specifically limited in this embodiment of the present disclosure.
  • the initial style image generation model is generated based on the initial style image.
  • the complexity is smaller than the initial style image generation model, and it has the function of real-time generation of style images. Therefore, in the application stage of the target style image real-time generation model, the target style image real-time generation model can be used to generate real-time style images that meet user needs.
  • the embodiment of the present disclosure solves the problems that the existing model has a single training method and cannot meet the needs of users to generate style images in real time, realizes the effect of real-time generation of style images for users, and improves the user experience of using the image style conversion function; Moreover, different target style image real-time generation models can be compatible with terminal devices with different performances, so that the style image generation method in the embodiment of the present disclosure can be widely applied to terminal devices with different performances.
  • multiple original face sample images and a target-style face sample image corresponding to each original face sample image are the input and output of the pre-trained target image model, respectively.
  • the target image model has the function of generating style images, and is used to generate style image samples in the process of obtaining the initial style image generation model and the target style image real-time generation model through training, so as to be used for training to obtain the initial style image generation model and the target style image.
  • the sample data of the style image real-time generation model is consistent, which reduces the training difficulty of the target style image real-time generation model.
  • the target image model may include any network model that supports non-aligned training, such as a conditional generative adversarial network CGAN model, a cycle-consistent generative adversarial network Cyclegan model, etc., which is not specifically limited in the embodiment of the present disclosure.
  • FIG. 2 is a flowchart of another style image generation method provided by an embodiment of the present disclosure, which is further optimized and expanded based on the above-mentioned technical solution, and can be combined with each of the above-mentioned optional embodiments.
  • the style image generation method may include:
  • any available face recognition technology can be used to identify the face area of the original face image, and output the parameter information of the bounding box surrounding the face area on the original face image, that is, the face area Parameter information for the bounding box.
  • the key point detection technology is used to determine the key point of the face area, and then the rotation angle of the face area is determined based on the key point.
  • the parameter information of the bounding box of the face region includes the position of the bounding box on the original face image. Further, the parameter information of the face region bounding box may also include the size and shape of the face region bounding box.
  • the size of the bounding box of the face area can be determined according to the parameters set in the adopted face recognition technology, or can be customized.
  • the face area bounding box can be any regular geometric figure; the rotation angle of the face area refers to the angle by which the face area should be rotated on the original face image in order to obtain an image that meets the preset face position requirements.
  • the key point detection technology By using the key point detection technology to obtain the rotation angle of the face area while recognizing the face area, it can be directly used in the adjustment of the face alignment, which can save the need to determine the face by the least square method or the singular value decomposition (SVD) method.
  • the complex operation of the affine transformation matrix for regional position adjustment can improve the efficiency of face position adjustment, and then realize real-time face position adjustment.
  • the parameter information of the bounding box of the face region may include the position of the bounding box on the original face image.
  • the position of the bounding box on the original face image can be represented by the position coordinates of each vertex of the face region bounding box on the original face image, or the distance of each edge from the image boundary on the original face image.
  • an affine transformation matrix for adjusting the position of the face region can be constructed based on the parameter information of the bounding box of the face region and the rotation angle of the face region with reference to the existing affine transformation principle. The position of the face area is adjusted to obtain an image that meets the preset face position requirements, that is, the first face image.
  • the preset face position requirement may be: after the face region position is adjusted, the face region is located in the central region of the entire image; or, after the face region position is adjusted, the face region The facial features of the region are located at a specific position in the entire image; or, after the face region position is adjusted, the face region and the background region (the remaining image region after the face region is removed from the entire image) occupy the entire image.
  • the position adjustment operation of the face region may include, but is not limited to, rotation, translation, reduction, enlargement, and cropping. According to the actual position of the face area bounding box on the original face image and the preset face position requirements, at least one position adjustment operation can be flexibly selected to adjust the position of the face area until the preset face position requirements are obtained. face image. In the process of adjusting the position of the face area on the original face image, the position of the original face image can be adjusted as a whole, or the matting technique can be used to enclose the frame including the face area or the face area including the face area. The sub-region is subjected to matting processing, so that the position of the bounding box or sub-region of the face region is adjusted independently, which is not specifically limited in the embodiment of the present disclosure.
  • the normalization preprocessing of the original face image is realized, It can ensure the generation effect of subsequent style images.
  • the target style image can be further processed flexibly according to the style image processing requirements, such as image background fusion requirements, face position recovery requirements, etc.
  • the style image generation method provided by the embodiment of the present disclosure further includes:
  • the position of the target face area in the target style face image is adjusted to obtain the first style face image corresponding to the position of the face area in the original face image, that is, the position of the target face area in the target style face image is restored. to a position consistent with the position of the face region in the original face image, thereby reducing the difference between the target style face image and the face region position on the original face image.
  • the inverse matrix M′ of the affine transformation matrix M can be obtained, and Using the inverse matrix M′ of the affine transformation matrix M, the position of the target face region in the target style face image is adjusted to obtain the first style face image.
  • style image generation method provided by the embodiment of the present disclosure further includes:
  • the target face area in the first style face image is fused with the target background area to obtain the second style face image.
  • the target background area (that is, the remaining image area except the face area) may be the background area of the original face image, or the background area processed by the background processing algorithm, such as the background area on the target-style face image. etc., on the basis of ensuring that a style image with a higher display effect is provided for the user, the embodiment of the present disclosure does not make a specific limitation. By blending with the target background area, the display effect of the final style image can be optimized.
  • any available image fusion technology may be used to perform fusion processing on the target face region and the target background region in the face image of the first style.
  • the target face area in the first style face image and the background area of the original face image to obtain the second style face image as an example, in addition to changing the image style, the second style face image can be realized.
  • Other image features or image details on the face image are still consistent with the original face image, and finally, the second style face image can be displayed to the user.
  • the position of the face region is adjusted on the original face image to be processed, and then the real-time generation model of the pre-trained target style image is used to obtain the corresponding target style person in real time.
  • the face image improves the generation effect of the style image, and solves the problem of poor image effect after image style conversion in the existing solution; and, in the embodiment of the present disclosure, the face area can be obtained while the face area is recognized.
  • the rotation angle is directly used in the adjustment of face position (or called face alignment), which improves the efficiency of face position adjustment, and then can realize real-time face position adjustment.
  • the position of the face region is adjusted based on the parameter information of the bounding box of the face region and the rotation angle of the face region to obtain the first face image, including:
  • the position of the face region is adjusted to obtain the first face image.
  • an affine transformation matrix may be constructed based on the acquired parameters, and then the position of the face region may be adjusted based on the affine transformation matrix.
  • the face position correction parameter value is used to correct the position of the face region on the position-adjusted image, which may include the correction of the lower and lower positions of the face or the correction of the left and right positions of the face, so as to improve the actual position of the face region on the original face image.
  • the accuracy of the position determination thereby ensuring the accuracy of the face region position adjustment.
  • the preset face position correction parameter value can be used to accurately determine the face region.
  • the preset image size refers to the predetermined image size of the input style image generation model. That is, if the original face image does not meet the preset image size, it is also necessary to perform image cropping on the original face image.
  • the rotation angle of the face region determined by the key point detection technology can be expressed as Roll
  • the value of the face position correction parameter can be expressed as ymeanScale
  • the value range of ymeanScale can be set to [0, 1]
  • the preset The image size can be expressed as targetSize
  • the parameter information of the bounding box of the face area includes the distance between each edge of the bounding box and the boundary of the original face image. Taking Figure 3 as an example, it is assumed that the lower left corner of the original face image is used as the image.
  • the distance between the two sides of the face area bounding box in the horizontal direction from the x-axis can be expressed as the first distance b and the second distance t
  • the distances between the two sides of the face region bounding box in the vertical direction from the y-axis can be expressed as a third distance l and a fourth distance r.
  • affine transformation matrix used to adjust the position of the face region can be expressed as a 2x3 matrix M, as shown below:
  • FIG. 4 is a flowchart of another style image generation method provided by an embodiment of the present disclosure, which is further optimized and expanded based on the foregoing technical solution, and may be combined with the foregoing optional implementation manners.
  • the same operations exist in FIG. 4 and FIG. 2 , which will not be repeated below, and reference may be made to the descriptions of the foregoing embodiments.
  • the style image generation method may include:
  • the four sides of the face area bounding box are parallel to the four sides of the original face image, and the parameter information of the face area bounding box includes the position parameters of the four sides in the original face image;
  • the face area bounding box can be This includes any regular geometric figure, which can be a square, for example.
  • the position representation of the face region bounding box on the original face image can be simplified.
  • S303 Acquire a preset face position correction parameter value and a preset image size.
  • the value of the face position correction parameter is used to correct the position of the face region on the position-adjusted image.
  • S304 Calculate the abscissa value of the center of the face region based on the position parameters in the horizontal direction corresponding to the four sides of the bounding box of the face region.
  • S305 Calculate the ordinate value of the center of the face region based on the position parameters in the vertical direction corresponding to the four sides of the bounding box of the face region and the value of the face position correction parameter.
  • the position parameters in the horizontal direction corresponding to the four sides of the bounding box of the face area may include the third distance 1 and the fourth distance r, and the four sides of the square corresponding to the position parameters in the vertical direction.
  • yMean ymeanScale ⁇ t+(1 ⁇ ymeanScale) ⁇ b.
  • the face cropping ratio edgeScale is used to indicate the cropping multiple of the bounding box of the face area on the original face image.
  • the face cropping ratio is 2, which means that on the original face image, according to the face area 2 times the size of the bounding box, crop the area of the image that includes the face area.
  • the value of the side length of the bounding box of the face area can be expressed as the difference (r-l) between the third distance l and the fourth distance r, or the difference between the first distance b and the second distance t. Difference (t-b).
  • the edge length value edgeLength of the face area can be expressed as:
  • edgeLength edgeScale ⁇ (r-l).
  • the affine transformation matrix M can be expressed as follows:
  • the affine transformation matrix required for adjusting the position of the face region is constructed according to the requirements of cropping, scaling, etc. for the original face image, so that the affine transformation matrix Adjust the position of the face area on the original face image to ensure the accuracy of the adjustment of the face area, and then use the pre-trained target style image to generate the model in real time to obtain the corresponding target style face image in real time, which improves the style
  • the image generation effect solves the problem of poor image effect after image style conversion in the existing scheme.
  • a pre-trained target style image is used to generate a model in real time to obtain a target style face image corresponding to the first face image, including:
  • the maximum pixel value on the second face image can be determined, and then all pixels on the second face image can be The value normalizes the currently determined maximum pixel value;
  • the target style face image corresponding to the third face image is obtained.
  • gamma correction can also be called gamma nonlinearization or gamma coding, which is used to perform nonlinear operations or inverse operations on the luminance or tristimulus values of light in a film or imaging system.
  • Gamma-correcting images can compensate for the characteristics of human vision, thereby maximizing the use of data bits or bandwidth representing black and white based on human perception of light or black and white.
  • the preset gamma value may be preset, which is not specifically limited in the embodiment of the present disclosure. For example, the pixel values of the three RGB channels on the first face image are simultaneously corrected with a gamma value of 1/1.5.
  • the specific implementation of gamma correction can be implemented with reference to the principles of the prior art.
  • a second face image with a more balanced brightness distribution can be obtained, which can reduce facial defects, avoid the phenomenon of unbalanced image brightness distribution leading to unsatisfactory effect of the generated style image, and ensure the obtained target image.
  • the presentation of style images is more stable.
  • FIG. 5 is a flowchart of another style image generation method provided by an embodiment of the present disclosure, which is used to exemplarily illustrate an embodiment of the present disclosure.
  • a user image is obtained first, and a matting processing technology can be used to extract the user The face area on the image, and then based on the affine transformation matrix determination method in the above-mentioned embodiment, determine the affine transformation matrix used to adjust the position of the face area on the user image, and use the affine transformation matrix to position the face area.
  • Adjustment that is, the face alignment processing in Figure 5
  • Inverse transformation matrix adjust the position of the face area on the target style image, restore the position of the face area, and fuse the restored face area with the background area on the user image, and finally the background can be fused After the style image is fed back to the user.
  • FIG. 6 is a flowchart of a training method for a style image generation model provided by an embodiment of the present disclosure.
  • the embodiment of the present disclosure can be applied to the situation of how to train a style image generation model that meets the style conversion requirements.
  • the style image generation model uses It is used to generate style images corresponding to the original face images.
  • the image style mentioned in the embodiments of the present disclosure may refer to image effects, such as Japanese comic style, European and American comic style, oil painting style, sketch style, or cartoon style, etc., which may be determined according to the classification of image styles in the field of image processing.
  • the training apparatus for the style image generation model provided by the embodiments of the present disclosure may be implemented by software and/or hardware, and may be integrated on any electronic device with computing capabilities, such as a server, etc.
  • the training method of the style image generation model may include:
  • the sample images in the model training process can be obtained from an open image database. Using a plurality of original face sample images and a target-style face sample image corresponding to each original face sample image in the model training of the embodiment of the present disclosure can ensure the consistency of the sample data, thereby achieving a higher level of performance.
  • the model training effect lays the foundation.
  • obtain multiple original face sample images and target-style face sample images corresponding to each original face sample image including:
  • the target-style face sample images corresponding to each original face sample image are obtained respectively.
  • the target image model has the function of generating style images, and is used to generate style image samples in the process of obtaining the initial style image generation model and the style image real-time generation model through training, so as to be used for subsequent training to obtain the initial style image generation model and style
  • the sample data of the image real-time generation model is consistent, which reduces the training difficulty of the style image real-time generation model.
  • the target image model may include any network model that supports non-aligned training, such as a conditional generative adversarial network CGAN model, a cycle-consistent generative adversarial network Cyclegan model, etc., which is not specifically limited in the embodiment of the present disclosure.
  • the target image model is trained based on the style face sample images obtained by using the image generation model.
  • the image generation model may include a Generative Adversarial Networks (GAN, Generative Adversarial Networks) model, and the specific implementation principle may refer to the prior art.
  • the training process of the target image model may include: acquiring multiple standard-style face sample images, and training the obtained standard image generation model based on the multiple standard-style face sample images; using the standard image generation model to generate multiple
  • the style face sample image used for training the target image model is trained to obtain the target image model based on the style face sample image used for training the target image model.
  • the aforesaid standard style face sample images may be obtained by professional painters drawing style images for a preset number (values may be determined according to training requirements) of original face sample images according to current image style requirements.
  • an initial style image generation model is obtained by training based on a plurality of original face sample images and a target style face sample image corresponding to each original face sample image.
  • the initial style image generation model has the function of style image generation.
  • the initial style image generation model may include a conditional generative adversarial network CGAN model, a cycle-consistent generative adversarial network Cyclegan model, or any other network model that supports non-aligned training, which is not specifically limited in the embodiment of the present disclosure.
  • the first cropping parameters of the initial style image generation model may be acquired, and based on the first cropping parameters, at least one cropping operation is performed on the initial style image generation model to obtain at least one style image real-time generation model.
  • the first cropping parameter is used to measure the importance of functional modules or neural network layers in the initial style image generation model.
  • the function module or neural network layer corresponding to the first cropping parameter that is smaller than the preset parameter threshold can be cropped to obtain a real-time generation model of style images.
  • the first cropping parameter may include, but is not limited to, the first important factor of the activation layer in the initial style image generation model, according to the first important factor, the activation layer in the initial style image generation model and the corresponding activation layer.
  • the convolutional layer is cropped, for example, the activation layer corresponding to the first important factor smaller than the preset parameter threshold and the convolutional layer corresponding to the activation layer can be cropped to obtain a style image real-time generation model.
  • the style image real-time generation model is obtained by cropping the initial style image generation model. Compared with the original style image generation model, the storage space occupation and computational complexity of the style image real-time generation model are reduced. The performance requirements of the terminal equipment during the model running process can realize the function of real-time generation of style images.
  • the style image real-time generation model is of the same type as the initial style image generation model, and can also include any network models that support non-aligned training, such as conditional generative adversarial network CGAN model, cycle-consistent generative adversarial network Cyclegan model, etc. limited.
  • a style image real-time generation model that meets the style image generation requirements can be obtained.
  • the training process of the initial style image generation model (large model) and the style image real-time generation model (small model) is equivalent to the training strategy of the size model, because the style image real-time generation model is the same as the initial style image generation model It is realized on the basis of the training of the original style image, and the sample data used is consistent, so the training difficulty of the real-time model can be greatly reduced. Supervising the features of real-time models further accelerates the training of real-time generation models for style images.
  • the initial style image generation model is generated based on the initial style image.
  • the embodiment of the present disclosure solves the problems that the existing model has a single training method and cannot meet the needs of users to generate style images in real time, realizes the effect of real-time generation of style images for users, and improves the user experience of using the image style conversion function.
  • the real-time generation model of the style image can be obtained by performing the training and cropping operations of the model one or more times.
  • performing at least one cropping operation based on the initial style image generation model includes: performing at least two cropping operations based on the initial style image generation model to obtain a first style image real-time generation model and a second style image real-time generation model.
  • the first-style image real-time generation model and the second-style image real-time generation model are trained based on the multiple original face sample images and the target-style face sample image corresponding to each original face sample image, so as to obtain the first target
  • the style image real-time generation model and the second target style image real-time generation model wherein the first target style image real-time generation model and the second target style image real-time generation model respectively correspond to different device performance information.
  • At least two cropping operations are performed based on the initial style image generation model to obtain the first style image real-time generation model and the second style image real-time generation model, including:
  • crop the initial style image generation model Based on the first cropping parameters, crop the initial style image generation model to obtain the first style image real-time generation model;
  • the second cropping parameter of the trained first style image real-time generation model is used to measure the importance of the functional module or the neural network layer in the first style image real-time generation model
  • the trained first style image real-time generation model is cropped to obtain the second style image real-time generation model.
  • the number of times of cyclic execution of the model cropping operation may be determined according to the model training requirements, which is not specifically limited in the embodiment of the present disclosure.
  • the trained first style image real-time generation model and the trained second style image real-time generation model, etc. can both be used as style image real-time generation models, and have the function of real-time generation of style images.
  • the first style image real-time generation model, the second style image real-time generation model, and other style image real-time generation models, etc. can respectively correspond to different device performance information, so that the terminal device can be sent to the terminal device according to the performance information of the terminal device.
  • the model is generated in real time from the style image adapted to the performance information. That is, different real-time generation models of style images can be compatible with terminal devices with different performances, so that the style image generation method in the embodiment of the present disclosure can be widely applied to terminal devices with different performances.
  • obtaining the first cropping parameters of the initial style image generation model including:
  • the initial style image generation model is cropped to obtain the first style image real-time generation model, including:
  • the activation layer in the initial style image generation model and the convolution layer corresponding to the activation layer are cropped to obtain the first style image real-time generation model
  • obtain the second cropping parameters of the real-time generation model of the first style image after training including:
  • the trained first style image real-time generation model is cropped to obtain the second style image real-time generation model, including:
  • the activation layer of the trained first style image real-time generation model and the convolution layer corresponding to the activation layer are cropped to obtain the second style image real-time generation model.
  • the multiple important factors of the activation layer in the initial style image generation model obtained after model training can be different.
  • the average value of multiple important factors can be used.
  • the first important factor of the activation layer in the initial style image generation model similarly, using different original face sample images as the model training input, the first style image after training is generated in real time.
  • Multiple important factors of the activation layer in the model are also It can be different.
  • the average value of multiple important factors can be used as the second important factor of the activation layer in the real-time generation model of the first style image after training.
  • obtain the first important factor of the activation layer in the initial style image generation model including:
  • the second important factor of the activation layer in the real-time generation model of the first style image after training including:
  • the Taylor expansion calculation is performed on the output value of the activation layer in the real-time generation model of the first style image after training, and the calculation result is used as the second important factor.
  • an initial style image generation model (large model) can be obtained by first training, and then the first-order Taylor expansion of each activation layer at the output value at the end of training is calculated, and each activation layer is estimated.
  • the importance of the layer according to the calculated first important factor, cut out the unimportant activation layer and the corresponding convolutional layer, and then continue the training to obtain the real-time generation model of the first style image;
  • the real-time generation model of the first style image is cropped, and the training is continued to obtain the second style image generation model.
  • FIG. 7 is a flowchart of another training method for a style image generation model provided by an embodiment of the present disclosure, which is further optimized and expanded based on the above technical solution, and can be combined with the above-mentioned optional embodiments.
  • the processing process of the original face image except for the different image processing objects, all belong to the same inventive concept, which is not detailed in the following embodiments.
  • FIG. 7 and FIG. 6 have the same operations, which will not be repeated hereafter, and reference may be made to the descriptions of the foregoing embodiments.
  • the training method of the style image generation model may include:
  • S702 Identify the face region of the original face sample image, and determine parameter information of the bounding box of the face region and the rotation angle of the face region.
  • any available face recognition technology can be used to identify the face region of the original face sample image, and output the bounding box surrounding the face region on the original face sample image, and at the same time, use the key point detection technology to determine The key points of the face area and the rotation angle of the face area.
  • the rotation angle of the face region refers to the angle at which the face region should be rotated on the original face sample image in order to obtain an image that meets the preset face position requirements;
  • the parameter information of the face region bounding box is used to represent The position of the bounding box on the original face sample image, and the size of the bounding box of the face area can be determined according to the parameters set in the adopted face recognition technology, or can be set by yourself.
  • the bounding box of the face region can be any regular geometric figure.
  • the key point detection technology By using the key point detection technology to obtain the rotation angle of the face area while recognizing the face area, it can be directly used in the adjustment of the face alignment, which can save the need to determine the face by the least square method or the singular value decomposition (SVD) method.
  • the complex operation of the affine transformation matrix for regional position adjustment can improve the efficiency of face position adjustment, and then realize real-time face position adjustment.
  • the parameter information of the face area bounding box may include, but is not limited to, the position coordinates of each vertex of the face area bounding box on the original face sample image, or the distance of each edge from the image boundary on the original face sample image, etc. .
  • an affine transformation matrix for adjusting the position of the face region can be constructed, and the original face sample image on the The position of the face area is adjusted to obtain an image that meets the preset face position requirements, that is, the first face sample image.
  • the position adjustment operation of the face region may include, but is not limited to, rotation, translation, reduction, enlargement, and cropping.
  • the position of the original face sample image can be adjusted as a whole, or the matting technique can be used to enclose the frame including the face region or include the face
  • the sub-region of the region is subjected to matting processing, so that the position of the bounding box or sub-region of the face region is individually adjusted, which is not specifically limited in this embodiment of the present disclosure.
  • S704 Obtain a target-style face sample image corresponding to each first face sample image.
  • a target-style face sample image corresponding to each original face sample image may be obtained by using a pre-trained target image model based on a plurality of first face sample images.
  • the target style face sample image corresponding to the first face sample image is used as a training sample, which improves the training effect of the initial style image generation model and the style image real-time generation model, and solves the problem that the existing model has a single training method and cannot meet the needs of users. It solves the problem of the requirement of real-time generation of style images, and at the same time improves the generation effect of style images in the model application stage, and solves the problem of poor image effect after image style conversion in the existing scheme.
  • the rotation angle of the face area can be obtained, which can be directly used in the adjustment of face alignment, which improves the efficiency of face position adjustment, realizes real-time face position adjustment, and improves model training. s efficiency.
  • the position of the face region is adjusted based on the parameter information of the bounding box of the face region and the rotation angle of the face region to obtain a first face sample image, including:
  • the position of the face region is adjusted to obtain a first face sample image.
  • an affine transformation matrix may be constructed based on the acquired parameters, and then the position of the face region may be adjusted based on the affine transformation matrix.
  • the face position correction parameter value is used to correct the position of the face region on the position-adjusted image, which may include the correction of the lower and lower positions of the face or the correction of the left and right positions of the face, so as to improve the accuracy of the face region on the original face sample image.
  • the accuracy of the actual position determination thereby ensuring the accuracy of the face region position adjustment. For example, if the vertical position of the face region determined based on the parameter information of the face region bounding box is higher than the actual position on the original face sample image, the preset face position correction parameter value can be used to accurately determine the face region. The actual location of the face area.
  • the preset image size refers to pre-determining the input image size in the model training process, that is, if the original face sample image does not meet the preset image size, the original face sample image needs to be cropped to ensure the final use of the model training process.
  • the sample images are of uniform size.
  • the rotation angle of the face region determined by the key point detection technology can be expressed as Roll
  • the value of the face position correction parameter can be expressed as ymeanScale
  • the value range of ymeanScale can be set to [0, 1]
  • the preset The image size can be expressed as targetSize
  • the parameter information of the bounding box of the face area includes the distance between each edge of the bounding box and the boundary of the original face sample image.
  • the distance between the two sides of the face area bounding box in the horizontal direction from the x-axis can be expressed as the first distance b and the second distance t
  • the distance between the two sides of the face area bounding box in the vertical direction from the y-axis can be expressed as a third distance l and a fourth distance r.
  • yMean ymeanScale ⁇ t+(1-ymeanScale) ⁇ b;
  • the affine transformation matrix used to adjust the position of the face region can be expressed as a 2x3 matrix M, as shown below:
  • FIG. 8 is a flowchart of another method for training a style image generation model according to an embodiment of the present disclosure, which is further optimized and expanded based on the above technical solution, and can be combined with the above-mentioned optional embodiments.
  • FIG. 8 has the same operations as those in FIG. 6 or FIG. 7 respectively, which will not be repeated below, but the description of the above-mentioned embodiment may be referred to.
  • the training method of the style image generation model may include:
  • the four sides of the face area bounding box are parallel to the four sides of the original face sample image, and the parameter information of the face area bounding box includes position parameters of the four sides in the original face sample image.
  • the value of the face position correction parameter is used to correct the position of the face region on the position-adjusted image.
  • S804 Calculate the abscissa value of the center of the face region based on the position parameters in the horizontal direction corresponding to the four sides of the bounding box of the face region.
  • S805 Calculate the ordinate value of the center of the face region based on the position parameters in the vertical direction corresponding to the four sides of the bounding box of the face region and the value of the face position correction parameter.
  • the face cropping ratio edgeScale is used to indicate the cropping multiple of the bounding box of the face region on the original face sample image.
  • the face cropping ratio is 2, which means that on the original face sample image, according to the human 2 times the size of the face area bounding box, crop the image area including the face area.
  • the value of the side length of the bounding box of the face area can be expressed as the difference (r-l) between the third distance l and the fourth distance r, or the difference between the first distance b and the second distance t. Difference (t-b).
  • the edge length value edgeLength of the face area can be expressed as:
  • edgeLength edgeScale ⁇ (r-l).
  • the affine transformation matrix M can be expressed as follows:
  • Roll represents the rotation angle of the face area determined by the key point detection technology
  • targetSize represents the preset image size
  • (xMean, yMean) represents the coordinates of the center of the face area.
  • a target-style face sample image corresponding to each original face sample image may be obtained by using a pre-trained target image model based on a plurality of first face sample images.
  • an affine transformation matrix required for adjusting the position of the face region is constructed through the requirements of cropping, scaling, etc. of the original face sample image.
  • the position on the sample image is adjusted to ensure the accuracy of the adjustment of the position of the face region, and the multiple first face sample images and the target style face sample image corresponding to each first face sample image are used as training samples.
  • the generation effect solves the problem that the existing model has a single training method and cannot meet the needs of users to generate style images in real time, and at the same time improves the model application stage.
  • the generation effect solves the problem of poor image effect after image style conversion in the existing scheme.
  • FIG. 9 is a flowchart of another method for training a style image generation model provided by an embodiment of the present disclosure, which is further optimized and expanded based on the above technical solution, and can be combined with the above-mentioned optional embodiments.
  • the operations in FIG. 9 and FIG. 6 are the same, which will not be described in detail below. Reference may be made to the descriptions of the above embodiments.
  • the training method of the style image generation model may include:
  • Face adjustment refers to adjusting the face of the person on the target-style face sample image according to the display requirements for the face shape of the person.
  • the face adjustment includes at least one of the following: face shape adjustment and mouth adjustment.
  • Face shape adjustment refers to the adjustment of the face shape on the target style face sample image according to the display requirements of the character face, such as face-lift adjustment; Adjust the mouth of the character on the image, such as adjusting the shape of the mouth, controlling the thickness of the mouth lines to be consistent, etc. That is to say, in this embodiment of the present disclosure, face fine-tuning is supported on the target style face sample image, so that the presentation effect of the target style face sample image is more beautiful, thereby ensuring that the initial style image generation model obtained by training and the style image are generated in real time. The model is more accurate, and can output a style image with high display effect for any input image.
  • the display effect of facial features is optimized, the construction of high-quality sample data is realized, and the initial style image generation model and the real-time generation of style images are improved.
  • the training effect of the model thereby ensuring the generation effect of the style image in the model application stage.
  • face shape adjustment is performed on the face region on the target style face sample image, including:
  • the face contour of the face region on the target style face sample image is adjusted to obtain the first style face sample image.
  • the key points of the initial face contour can be obtained by using the key point detection technology to perform key point detection on the face region on the target style face sample image.
  • the key points of the target face contour are determined according to the face shape adjustment requirements. According to the translation transformation between the initial face contour key point and the target face contour key point, the initial face contour key point is moved to the target face contour key point, so as to realize face adjustment.
  • the face contour of the face region on the target-style face sample image is adjusted, including:
  • the deformed face region is rendered using the face texture of the target style face sample image to obtain the first style face sample image.
  • the thin plate spline interpolation function (thin plate spline) is a two-dimensional deformation processing algorithm, and the specific principle can be realized with reference to the prior art.
  • Using the thin-plate spline interpolation function to deform the face region can ensure the smoothness of the face contour after face adjustment.
  • Using the face texture of the target-style face sample image to render the deformed face region can ensure the consistency of the face texture after face adjustment.
  • using the thin-plate spline interpolation function to deform the face region may specifically include:
  • the vertices of the triangulation network are translated using a thin-plate spline interpolation function.
  • the entire style image area can also be triangulated; as an example, triangulation has the advantage of convenient calculation and processing.
  • triangulation has the advantage of convenient calculation and processing.
  • other styles of image meshing can also be adapted in practical applications. deal with.
  • the face area on the target-style face sample image or the entire target-style face sample image may be triangulated, and the initial face contour key points of the face area on the target-style face sample image are determined.
  • the thin-plate spline interpolation function is used to interpolate the translation amount from the initial face contour key point L1 to the target face contour key point L2 to each triangular mesh vertex.
  • the vertices are translated, and finally the face texture on the target-style face sample image is used as the current texture, and the new triangular mesh is rendered to obtain the target-style face sample image after face reduction.
  • the risk of face deformation can be reduced to a certain extent, and the overall presentation effect of the face can be maintained.
  • performing mouth adjustment on the face region on the target-style face sample image including:
  • the mouth determined based on the key points of the mouth is removed from the face area of the target-style face sample image, and the incomplete-style face sample image is obtained; for example, when the mouth state is determined to be open, the The mouth determined by the key points is removed from the face area of the target style face sample image, and the incomplete style face sample image is obtained;
  • the pre-generated mouth material is fused with the incomplete style face sample image to obtain the first style face sample image.
  • the key points of the mouth can also be obtained by using the key point detection technology to perform key point detection on the face area on the target-style face sample image.
  • the mouth state can be determined according to the distance between the keypoints belonging to the upper and lower lips, for example, among the keypoints of the upper and lower lips, the number of keypoints whose distance between the upper and lower corresponding keypoints exceeds the distance threshold exceeds the number of Threshold, the mouth state is considered to be open, otherwise it is closed. Both the distance threshold and the number threshold can be set adaptively. If it is determined that the mouth is open, in order to ensure the effect of the mouth, the pre-designed mouth material is used to replace the mouth on the target style face sample image.
  • the mouth determined based on the key points of the mouth is removed from the face region of the target-style face sample image to obtain the incomplete-style face sample image, including:
  • a sub-region surrounding the mouth is determined in the face region of the target-style face sample image; wherein, the size of the sub-region can be determined adaptively, which is not specifically limited in the embodiment of the present disclosure;
  • the fixed boundary solution algorithm refers to an algorithm used to determine the boundary of a target figure (such as a mouth) in the field of image processing, such as based on Laplace
  • the edge detection algorithm of the (Laplace) operator, etc. can be implemented with reference to the prior art.
  • the boundary conditions in the calculation process are determined according to each key point included on the boundary of the sub-region, that is, according to the key points of the face skin on the boundary of the sub-region. point;
  • the mouth is removed from the face region of the target style face sample image to obtain the incomplete style image.
  • the pre-generated mouth material is fused with the incomplete-style face sample image, including:
  • the deformed mouth material is rendered using the mouth texture of the target-style face sample image.
  • the key points marked on the mouth material and the key points of the mouth in the face region of the target-style face sample image have a corresponding relationship, for example, the coordinates of the key points are determined in the same image coordinate system. Align the key points marked on the mouth material with the key points of the mouth in the face area of the target-style face sample image, that is, to realize the difference between the mouth material and the mouth in the face area of the target-style face sample image.
  • the key point mapping between the two can paste the mouth material back to the mouth area on the incomplete style sample image.
  • Using the thin-plate spline interpolation function to deform the mouth material can ensure the smoothness of the border of the mouth material and ensure the display effect of the mouth.
  • fuse the pre-generated mouth footage with the mutilated style image including:
  • the mouth material is deformed based on the thin-plate spline interpolation function, including:
  • the area between the inner boundary line and the outer boundary line is deformed using the solution optimization algorithm.
  • FIG. 10 is a schematic diagram of a mouth material provided by an embodiment of the present disclosure, and specifically shows the inner boundary line and outer boundary line of the outline of the mouth edge, and the inner boundary line and the outer boundary line can be appropriately colored according to requirements filling.
  • the layer mesh refers to the mesh obtained by meshing the area within the inner boundary line of the mouth material; the outer mesh refers to meshing the area between the inner boundary line and the outer boundary line.
  • the resulting grid Both the inner mesh and the inner mesh may be triangulated meshes.
  • the deformation control of the outer mesh can be implemented based on as-rigid-as-possible without rotation, and the deformation control of the inner mesh can still be implemented based on the thin-plate spline interpolation function. .
  • the inner layer mesh can be deformed by using the thin-plate spline interpolation function, and then the thin-plate spline interpolation can be used to solve the optimization problem.
  • the vertices of the outer mesh are obtained.
  • the area of the outer mesh can also be determined, so as to realize the fusion of the mouth material and the incomplete style image by controlling the area between the inner boundary line and the outer boundary line.
  • the process remains unchanged, that is, the thickness of the hook line on the edge of the mouth remains unchanged.
  • u represents an unknown vertex
  • I is a 2x2 identity matrix.
  • the embodiment of the present disclosure also needs to control the thickness of the hook line on the edge of the mouth when the mouth is in the closed state, and the above method can also be used. accomplish.
  • the above mouth adjustment operations are all implemented based on the target style face sample image, to optimize the display effect of the mouth on the target style face sample image, and then optimize the training of the initial style image generation model and the style image real-time generation model. Effect.
  • FIG. 11 is a schematic structural diagram of an apparatus for generating a style image provided by an embodiment of the present disclosure.
  • the embodiment of the present disclosure may be applicable to a situation in which a style image of any style is generated based on an original face image.
  • the apparatus can be implemented by software and/or hardware, and can be integrated on any electronic device with computing capabilities, such as a terminal, which may include, but is not limited to, a smart mobile terminal, a tablet computer, a personal computer, and the like.
  • the style image generation apparatus 1100 may include an original face image acquisition module 1101 and a target style face image generation module 1102, wherein:
  • the target style face image generation module 1102 is used to generate a model in real time by using the pre-trained target style image to obtain the target style face image corresponding to the original face image;
  • the target style image real-time generation model is obtained by training at least one style image real-time generation model obtained by performing at least one cropping operation based on the initial style image generation model according to at least one set of cropping parameters after the initial style image generation model is obtained by training, and
  • the initial style image generation model and the target style image real-time generation model are both obtained by training based on multiple original face sample images and the target style face sample image corresponding to each original face sample image, wherein the style image real-time generation model varies with the cropping parameters.
  • a plurality of original face sample images and a target-style face sample image corresponding to each original face sample image are respectively the input and output of a pre-trained target image model, and the target image model is used to generate the original face sample image.
  • the target style face sample image corresponding to the face sample image provides training samples for the initial style image generation model and the target style image generation model.
  • At least two cropping operations are performed according to at least two sets of cropping parameters based on the initial style image generation model to obtain at least two style image real-time generation models accordingly, and at least two style image real-time generation models are trained to obtain at least two style image real-time generation models.
  • real-time generation models for each target style image, and at least two target style image real-time generation models respectively correspond to different device performance information;
  • the style image generating apparatus 1100 provided by the embodiment of the present disclosure further includes:
  • the model obtaining module is used for obtaining the real-time generation model of the target style image adapted to the current equipment performance information based on the current equipment performance information.
  • the style image generating apparatus 1100 provided by the embodiment of the present disclosure further includes:
  • the face recognition module is used to identify the face area of the original face image, and determine the parameter information of the bounding box of the face area and the rotation angle of the face area;
  • the face position adjustment module is used to adjust the position of the face area based on the parameter information of the bounding box of the face area and the rotation angle of the face area, and obtain the first face image, so as to obtain the target style based on the first face image. face image.
  • the face position adjustment module includes:
  • the first parameter obtaining unit is used to obtain the preset face position correction parameter value and the preset image size; wherein, the face position correction parameter value is used to correct the position of the face region on the position-adjusted image;
  • the first face image determination unit is used to adjust the position of the face area based on the parameter information of the bounding box of the face area, the rotation angle of the face area, the face position correction parameter value and the preset image size, and obtain the first person face image.
  • the four sides of the face area bounding box are parallel to the four sides of the original face image, and the parameter information of the face area bounding box includes the position parameters of the four sides in the original face image;
  • the first face The image determination unit includes:
  • the first coordinate calculation subunit is used to calculate the abscissa value of the center of the face area based on the position parameters in the horizontal direction corresponding to the four sides of the bounding box of the face area;
  • the second coordinate calculation subunit is used to calculate the ordinate value of the center of the face area based on the position parameter in the vertical direction and the face position correction parameter value corresponding to the four sides of the bounding box of the face area;
  • the affine transformation matrix construction subunit is used to construct an affine transformation matrix based on the abscissa value of the center of the face area, the ordinate value of the center of the face area, the rotation angle of the face area and the preset image size;
  • the first face image determination subunit is used to adjust the position of the face region based on the affine transformation matrix to obtain the first face image.
  • the face position adjustment module further includes:
  • the face cropping ratio acquisition unit is used to obtain the preset face cropping ratio
  • the side length value determination unit of the face area is used to calculate the side length value of the face area based on the proportion of face cropping and the side length value of the bounding box of the face area;
  • the scaling size value determining unit is used for calculating the scaling size value based on the side length value of the face area and the preset image size.
  • the affine transformation matrix construction subunit is specifically used for:
  • An affine transformation matrix is constructed based on the abscissa value of the center of the face area, the ordinate value of the center of the face area, the rotation angle of the face area, the preset image size and the scaling size value.
  • the style image generating apparatus 1100 provided by the embodiment of the present disclosure further includes:
  • the target face area acquisition module is used to obtain the target face area in the target style face image
  • the first style face image determination module is used to adjust the position of the target face area in the target style face image, and obtain the first style face image corresponding to the position of the face area in the original face image.
  • the style image generating apparatus 1100 provided by the embodiment of the present disclosure further includes:
  • the second style face image determination module is configured to perform fusion processing on the target face area and the target background area in the first style face image to obtain the second style face image.
  • the style image generating apparatus provided by the embodiment of the present disclosure can execute any style image generating method provided by the embodiment of the present disclosure, and has functional modules and beneficial effects corresponding to the execution method.
  • any style image generating method provided by the embodiment of the present disclosure, and has functional modules and beneficial effects corresponding to the execution method.
  • FIG. 12 is a schematic structural diagram of a training device for a style image generation model provided by an embodiment of the present disclosure.
  • the embodiment of the present disclosure can be applied to the situation of how to train a style image generation model that meets the needs of style conversion.
  • the style image generation model uses It is used to generate style images corresponding to the original face images.
  • the training device can be implemented by software and/or hardware, and can be integrated on any electronic device with computing capability, such as a server.
  • the training apparatus 1200 for the style image generation model may include a sample acquisition module 1201, a first training module 1202, a model cropping module 1203, and a second training module 1204, wherein:
  • a sample acquisition module 1201 configured to acquire a plurality of original face sample images and a target style face sample image corresponding to each original face sample image;
  • the first training module 1202 is used for training to obtain an initial style image generation model based on a plurality of original face sample images and a target style face sample image corresponding to each original face sample image;
  • the model cropping module 1203 is configured to perform at least one cropping operation according to at least one set of cropping parameters based on the initial style image generation model to obtain at least one style image real-time generation model, which changes with the change of cropping parameters ;
  • the second training module 1204 is configured to train at least one style image real-time generation model based on a plurality of original face sample images and a target style face sample image corresponding to each original face sample image to obtain a trained target Style images generate models in real-time.
  • model cropping module 1203 is specifically used for:
  • the model cropping module 1203 includes: a first cropping parameter acquisition unit, configured to acquire the first cropping parameter of the initial style image generation model;
  • a first cropping unit configured to crop the initial style image generation model based on the first cropping parameter to obtain the first style image real-time generation model
  • the second cropping parameter obtaining unit is used to obtain the second cropping parameter of the real-time generation model of the first style image after training;
  • the second cropping unit is configured to crop the trained first style image real-time generation model based on the second cropping parameter to obtain the second style image real-time generation model.
  • the first cropping parameter obtaining unit is specifically configured to: obtain the first important factor of the activation layer in the initial style image generation model;
  • the first cropping unit is specifically used for:
  • the activation layer in the initial style image generation model and the convolution layer corresponding to the activation layer are cropped to obtain the first style image real-time generation model
  • the second cropping parameter obtaining unit is specifically used for:
  • the second cropping unit is specifically used for:
  • the activation layer of the trained first style image real-time generation model and the convolution layer corresponding to the activation layer are cropped to obtain the second style image real-time generation model.
  • the first cropping parameter obtaining unit is specifically used for:
  • the second cropping parameter obtaining subunit is specifically used for:
  • the Taylor expansion calculation is performed on the output value of the activation layer in the real-time generation model of the first style image after training, and the calculation result is used as the second important factor.
  • the first style image real-time generation model and the second style image real-time generation model are trained based on a plurality of original face sample images and the target style face sample image corresponding to each original face sample image, so as to respectively.
  • a first target style image real-time generation model and a second target style image real-time generation model are obtained, wherein the first target style image real-time generation model and the second target style image real-time generation model respectively correspond to different device performance information.
  • the sample acquisition module 1201 includes:
  • an original face sample image acquisition unit used for acquiring multiple original face sample images
  • the target-style face sample image acquisition unit is used for obtaining target-style face sample images corresponding to each original face sample image by using the pre-trained target image model.
  • the target image model is obtained by training based on the style face sample images generated by the standard image generation model.
  • the standard image generation model is trained on multiple standard-style face sample images.
  • the training apparatus 1200 for the style image generation model provided by the embodiment of the present disclosure further includes:
  • the face adjustment module is used to perform face adjustment on the face area on the target style face sample image to obtain the first style face sample image, so as to combine the multiple original face sample images and the obtained multiple first style face images
  • the face sample images are used for training to obtain the initial style image generation model and the trained style image real-time generation model.
  • the face adjustment includes face shape adjustment and/or mouth adjustment.
  • the face adjustment module includes a face shape adjustment unit for performing face shape adjustment on the face region on the target style face sample image;
  • the face adjustment unit includes:
  • the key point determination subunit is used to determine the initial face contour key points of the face region on the target-style face sample image, and the target face contour key points corresponding to the initial face contour key points; wherein, the target face contour The key points are determined according to the needs of face adjustment;
  • the face shape adjustment subunit is used to adjust the face contour of the face region on the target style face sample image based on the initial face contour key points and the target face contour key points to obtain the first style face sample image.
  • the face shape adjustment subunit includes:
  • the key point moving subunit is used to move the initial face contour key point to the target face contour key point, and use the thin plate spline interpolation function to deform the face area on the target style face sample image;
  • the image rendering subunit is used for rendering the deformed face region by using the face texture of the target style face sample image, so as to obtain the first style face sample image.
  • the face adjustment module includes a mouth adjustment unit for performing mouth adjustment on the face region on the target-style face sample image;
  • the mouth adjustment unit includes:
  • the mouth key point determination subunit is used to determine the mouth key points of the face area on the target style face sample image
  • the incomplete-style face sample image determination subunit is used to remove the mouth determined based on the key points of the mouth from the face area of the target-style face sample image to obtain the incomplete-style face sample image;
  • the first style face sample image determination subunit is used for fusing the pre-generated mouth material with the incomplete style face sample image to obtain the first style face sample image.
  • the subunit for determining the incomplete-style face sample image includes:
  • the sub-region determination sub-unit is used to determine the sub-region surrounding the mouth in the face region of the target-style face sample image based on the key points of the mouth;
  • the mouth boundary determination subunit is used to determine the mouth boundary line in the subregion by using the fixed boundary solution algorithm
  • the mouth removal subunit is used to remove the mouth from the face area of the target style face sample image based on the mouth boundary line to obtain the incomplete style face sample image.
  • the first style face sample image determination subunit includes:
  • the key point alignment and deformation sub-unit is used to align the key points marked on the mouth material with the key points of the mouth in the face area of the target style face sample image, and based on the thin-plate spline interpolation function, the mouth material is processed. deformation processing;
  • the image rendering subunit is used to render the deformed mouth material using the mouth texture of the target-style face sample image.
  • the first style face sample image determination subunit further includes:
  • the inner and outer boundary determination subunits are used to determine the inner and outer boundary lines of the mouth contour on the mouth material
  • Keypoint alignment and deformation subunits including:
  • the key point alignment sub-unit is used to align the key points of the mouth in the face region of the target-style face sample image based on the key points marked on the mouth material;
  • the first deformation subunit is used to deform the area within the inner boundary line of the mouth material by using the thin-plate spline interpolation function
  • the second deformation subunit is used to perform deformation processing on the area between the inner boundary line and the outer boundary line by using the solution optimization algorithm.
  • the training apparatus 1200 for the style image generation model provided by the embodiment of the present disclosure further includes:
  • the face recognition module is used to identify the face area of the original face sample image, and to determine the parameter information of the bounding box of the face area and the rotation angle of the face area;
  • the first face sample image determination module is used to adjust the position of the face area based on the parameter information of the bounding box of the face area and the rotation angle of the face area to obtain the first face sample image.
  • the target-style face sample image acquisition unit is specifically configured to obtain a target-style face sample image corresponding to each original face sample image by using a pre-trained target image model based on a plurality of first face sample images.
  • the first face sample image determination module includes:
  • the first parameter obtaining unit is used to obtain the preset face position correction parameter value and the preset image size; wherein, the face position correction parameter value is used to correct the position of the face region on the position-adjusted image;
  • the first face sample image determination unit is used to adjust the position of the face area based on the parameter information of the bounding box of the face area, the rotation angle of the face area, the value of the face position correction parameter and the preset image size, and obtain the first A sample image of a person's face.
  • the four sides of the face area bounding box are parallel to the four sides of the original face sample image, and the parameter information of the face area bounding box includes the position parameters of the four sides in the original face sample image;
  • the first face sample image determination unit includes:
  • the first coordinate calculation subunit is used to calculate the abscissa value of the center of the face area based on the position parameters in the horizontal direction corresponding to the four sides of the bounding box of the face area;
  • the second coordinate calculation subunit is used to calculate the ordinate value of the center of the face area based on the position parameter in the vertical direction and the face position correction parameter value corresponding to the four sides of the bounding box of the face area;
  • the affine transformation matrix construction subunit is used to construct an affine transformation matrix based on the abscissa value of the center of the face area, the ordinate value of the center of the face area, the rotation angle of the face area and the preset image size;
  • the position adjustment subunit is used to adjust the position of the face region based on the affine transformation matrix to obtain the first face sample image.
  • the first face sample image determination module further includes:
  • the face cropping ratio acquisition unit is used to obtain the preset face cropping ratio
  • the side length value determination unit of the face area is used to calculate the side length value of the face area based on the proportion of face cropping and the side length value of the bounding box of the face area;
  • a scaling size value determination unit used for calculating the scaling size value based on the side length value of the face area and the preset image size
  • the affine transformation matrix construction subunit is specifically used for:
  • An affine transformation matrix is constructed based on the abscissa value of the center of the face area, the ordinate value of the center of the face area, the rotation angle of the face area, the preset image size and the scaling size value.
  • the apparatus for training a style image generation model provided by the embodiment of the present disclosure can execute the training method for an arbitrary style image generation model provided by the embodiment of the present disclosure, and has functional modules and beneficial effects corresponding to the execution method.
  • the apparatus for training a style image generation model provided by the embodiment of the present disclosure can execute the training method for an arbitrary style image generation model provided by the embodiment of the present disclosure, and has functional modules and beneficial effects corresponding to the execution method.
  • FIG. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, which is used to exemplarily illustrate the electronic device for executing a style image generation method or a training method for a style image generation model in an example of the present disclosure.
  • the electronic devices in the embodiments of the present disclosure may include, but are not limited to, such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (eg, mobile terminals such as in-vehicle navigation terminals), etc., and stationary terminals such as digital TVs, desktop computers, and the like.
  • the electronic device shown in FIG. 13 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
  • electronic device 1300 may include processing means (eg, central processing unit, graphics processor, etc.) 1301 that may be loaded into random access according to a program stored in read only memory (ROM) 1302 or from storage means 1308 Various appropriate actions and processes are executed by the programs in the memory (RAM) 1303 .
  • RAM read only memory
  • various programs and data necessary for the operation of the electronic device 1300 are also stored.
  • the processing device 1301, the ROM 1302, and the RAM 1303 are connected to each other through a bus 1304.
  • An input/output (I/O) interface 1305 is also connected to bus 1304 .
  • the ROM 1302, RAM 1303 and storage device 1308 shown in FIG. 13 may be collectively referred to as a memory for storing executable instructions or programs of the processing device 1301.
  • I/O interface 1305 the following devices can be connected to the I/O interface 1305: input devices 1306 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 1307 of a computer, etc.; a storage device 1308 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 1309. Communication means 1309 may allow electronic device 1300 to communicate wirelessly or by wire with other devices to exchange data.
  • FIG. 13 shows an electronic device 1300 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
  • embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via the communication device 1309, or from the storage device 1308, or from the ROM 1302.
  • the processing device 1301 the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
  • the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two.
  • the computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable signal medium may include a data signal in baseband or propagated as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device .
  • Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
  • the client and server can use any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol) to communicate, and can communicate with digital data in any form or medium Communication (eg, a communication network) interconnects.
  • HTTP HyperText Transfer Protocol
  • Examples of communication networks include local area networks (LANs), wide area networks (WANs), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
  • the above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.
  • the computer-readable medium carries one or more programs, and when the one or more programs are executed by the electronic device, the electronic device causes the electronic device to: acquire an original face image; use a pre-trained target style image A model is generated in real time, and a target style face image corresponding to the original face image is obtained; wherein, the target style image real-time generation model is obtained after training to obtain an initial style image generation model.
  • At least one set of cropping parameters is obtained by performing at least one cropping operation, and the initial style image generation model and the target style image real-time generation model are both based on a plurality of original face sample images and a target corresponding to each original face sample image.
  • the style face sample image is obtained by training, wherein the real-time generation model of the style image changes with the change of the cropping parameter.
  • a computer-readable medium carries one or more programs, and when the one or more programs are executed by the electronic device, the electronic device causes the electronic device to: acquire a plurality of original face sample images and compare them with each target style face sample images corresponding to the original face sample images; based on the multiple original face sample images and the target style face sample images corresponding to each original face sample image, the initial style image is obtained by training generating a model; performing at least one cropping operation on the initial style image generation model according to at least one set of cropping parameters to obtain a style image real-time generation model, and the style image real-time generation model changes with the change of the cropping parameters; based on The multiple original face sample images and the target style face sample images corresponding to each original face sample image are trained on the style image real-time generation model to obtain a trained target style image real-time generation model .
  • the electronic device can also be made to execute other style image generation methods or other style image generation models provided by the examples of the present disclosure training method.
  • computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages, such as Java, Smalltalk, C++, and also conventional procedural programming languages, such as the "C" language or similar programming languages.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider to connect).
  • LAN local area network
  • WAN wide area network
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the modules or units involved in the embodiments of the present disclosure may be implemented in software or hardware.
  • the name of the module or unit does not constitute a limitation of the module or unit itself in some cases, for example, the original face image acquisition module can also be described as "a module for acquiring original face images”.
  • exemplary types of hardware logic components include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
  • FPGAs Field Programmable Gate Arrays
  • ASICs Application Specific Integrated Circuits
  • ASSPs Application Specific Standard Products
  • SOCs Systems on Chips
  • CPLDs Complex Programmable Logical Devices
  • a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device.
  • the machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium.
  • Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing.
  • machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
  • RAM random access memory
  • ROM read only memory
  • EPROM or flash memory erasable programmable read only memory
  • CD-ROM compact disk read only memory
  • magnetic storage or any suitable combination of the foregoing.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • Geometry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

Embodiments of the present application relate to a style image generation method and apparatus, a model training method and apparatus, a device, and a medium. The style image generation method comprises: obtaining an original face image; using a pre-trained target style image real-time generation model to obtain a target style face image corresponding to the original face image, wherein the target style image real-time generation model is obtained by training, after training an initial style image generation model, at least one style image real-time generation model obtained by performing at least one cropping operation on the initial style image generation model according to at least one set of cropping parameters. The style image real-time generation model changes along with changes in the cropping parameters. The embodiments of the present application can solve the problems that there are a few existing model training modes and the existing model training modes cannot meet the needs of users to generate style images in real time, and achieve the effect of generating style images in real time for users.

Description

风格图像生成方法、模型训练方法、装置、设备和介质Style image generation method, model training method, apparatus, equipment and medium
本申请要求于2020年09月30日提交国家知识产权局、申请号为202011066405.7、申请名称为“风格图像生成方法、模型训练方法、装置、设备和介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on September 30, 2020 with the State Intellectual Property Office, the application number is 202011066405.7, and the application name is "Style Image Generation Method, Model Training Method, Apparatus, Equipment and Medium", all of which are The contents are incorporated herein by reference.
技术领域technical field
本公开涉及图像处理技术领域,尤其涉及一种风格图像生成方法、模型训练方法、装置、设备和介质。The present disclosure relates to the technical field of image processing, and in particular, to a style image generation method, a model training method, an apparatus, a device and a medium.
背景技术Background technique
目前,随着视频交互应用的功能逐渐丰富化,图像风格转换成为了一种新的趣味性玩法。图像风格转换是指将将一幅或者多幅图像进行风格转换,生成符合用户需求的风格图像。At present, with the gradual enrichment of the functions of video interactive applications, image style conversion has become a new interesting gameplay. Image style conversion refers to the style conversion of one or more images to generate a style image that meets user needs.
训练具有风格图像生成功能的模型,是当前实现图像风格转换主要方式。然而,现有方案中模型的训练方式单一,并不能满足用户的实时生成风格图像的需求。Training a model with the function of generating style images is currently the main way to realize image style transfer. However, the training method of the model in the existing scheme is single, which cannot meet the needs of users to generate style images in real time.
发明内容SUMMARY OF THE INVENTION
为了解决上述技术问题或者至少部分地解决上述技术问题,本公开实施例提供了一种风格图像生成方法、模型训练方法、装置、设备和介质。In order to solve the above technical problems or at least partially solve the above technical problems, the embodiments of the present disclosure provide a style image generation method, a model training method, an apparatus, a device and a medium.
第一方面,本公开实施例提供了一种风格图像生成方法,包括:In a first aspect, an embodiment of the present disclosure provides a method for generating a style image, including:
获取原始人脸图像;Get the original face image;
利用预先训练的目标风格图像实时生成模型,得到与所述原始人脸图像对应的目标风格人脸图像;Utilize the pre-trained target style image to generate the model in real time to obtain the target style face image corresponding to the original face image;
其中,所述目标风格图像实时生成模型是在训练得到初始风格图像生成模型后,对基于所述初始风格图像生成模型按照至少一组裁剪参数进行至少一次裁剪操作得到的至少一个风格图像实时生成模型训练得到,并且所述初始风格图像生成模型和所述目标风格图像实时生成模型均基于多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像训练得到,其中,所述风格图像实时生成模型随着所述裁剪参数的变化而变化。The target style image real-time generation model is a real-time generation model of at least one style image obtained by performing at least one cropping operation based on the initial style image generation model according to at least one set of cropping parameters after training to obtain the initial style image generation model. obtained by training, and the initial style image generation model and the target style image real-time generation model are both obtained by training based on multiple original face sample images and target style face sample images corresponding to each original face sample image, wherein , the real-time generation model of the style image changes with the change of the cropping parameter.
第二方面,本公开实施例还提供了一种风格图像生成模型的训练方法,包括:In a second aspect, an embodiment of the present disclosure also provides a method for training a style image generation model, including:
获取多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像;acquiring a plurality of original face sample images and a target style face sample image corresponding to each original face sample image;
基于所述多个原始人脸样本图像和所述与每个原始人脸样本图像对应的目标风格人脸样本图像,训练得到初始风格图像生成模型;Based on the plurality of original face sample images and the target style face sample image corresponding to each original face sample image, an initial style image generation model is obtained by training;
基于所述初始风格图像生成模型按照至少一组裁剪参数进行至少一次裁剪操作,以得到至少一个风格图像实时生成模型,其中,所述风格图像实时生成模型随着所述裁剪参数的变化而变化;Perform at least one cropping operation based on the initial style image generation model according to at least one set of cropping parameters to obtain at least one style image real-time generation model, wherein the style image real-time generation model changes with the change of the cropping parameters;
基于所述多个原始人脸样本图像和所述与每个原始人脸样本图像对应的目标风格人脸样本图像,对所述至少一个风格图像实时生成模型进行训练,得到训练后的目标风格图像 实时生成模型。Based on the plurality of original face sample images and the target style face sample image corresponding to each original face sample image, the at least one style image real-time generation model is trained to obtain a trained target style image Generate models in real time.
第三方面,本公开实施例还提供了一种风格图像生成装置,包括:In a third aspect, an embodiment of the present disclosure further provides an apparatus for generating a style image, including:
原始人脸图像获取模块,用于获取原始人脸图像;The original face image acquisition module is used to obtain the original face image;
目标风格人脸图像生成模块,用于利用预先训练的目标风格图像实时生成模型,得到与所述原始人脸图像对应的目标风格人脸图像;a target-style face image generation module, used to generate a model in real time by using the pre-trained target-style image to obtain a target-style face image corresponding to the original face image;
其中,所述目标风格图像实时生成模型是在训练得到初始风格图像生成模型后,对基于所述初始风格图像生成模型按照至少一组裁剪参数进行至少一次裁剪操作得到的至少一个风格图像实时生成模型训练得到,并且所述初始风格图像生成模型和所述目标风格图像实时生成模型均基于多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像训练得到,其中,所述风格图像实时生成模型随着所述裁剪参数的变化而变化。The target style image real-time generation model is a real-time generation model of at least one style image obtained by performing at least one cropping operation based on the initial style image generation model according to at least one set of cropping parameters after training to obtain the initial style image generation model. obtained by training, and the initial style image generation model and the target style image real-time generation model are both obtained by training based on multiple original face sample images and target style face sample images corresponding to each original face sample image, wherein , the real-time generation model of the style image changes with the change of the cropping parameter.
第四方面,本公开实施例还提供了一种风格图像生成模型的训练装置,包括:In a fourth aspect, an embodiment of the present disclosure further provides a training device for a style image generation model, including:
样本获取模块,用于获取多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像;a sample acquisition module, used for acquiring a plurality of original face sample images and a target style face sample image corresponding to each original face sample image;
第一训练模块,用于基于所述多个原始人脸样本图像和所述与每个原始人脸样本图像对应的目标风格人脸样本图像,训练得到初始风格图像生成模型;a first training module, used for training to obtain an initial style image generation model based on the plurality of original face sample images and the target style face sample images corresponding to each original face sample image;
模型裁剪模块,用于基于所述初始风格图像生成模型按照至少一组裁剪参数进行至少一次裁剪操作,以得到至少一个风格图像实时生成模型,其中,所述风格图像实时生成模型随着所述裁剪参数的变化而变化;A model cropping module, configured to perform at least one cropping operation according to at least one set of cropping parameters based on the initial style image generation model to obtain at least one style image real-time generation model, wherein the style image real-time generation model follows the cropping changes in parameters;
第二训练模块,用于基于所述多个原始人脸样本图像和所述与每个原始人脸样本图像对应的目标风格人脸样本图像,对所述至少一个风格图像实时生成模型进行训练,得到训练后的目标风格图像实时生成模型。a second training module, configured to train the at least one style image real-time generation model based on the plurality of original face sample images and the target style face sample image corresponding to each original face sample image, Obtain the trained target style image to generate the model in real time.
第五方面,本公开实施例还提供了一种电子设备,所述电子设备包括:处理装置;存储器,用于存储所述处理装置可执行指令;所述处理装置,用于从所述存储器中读取所述可执行指令,并执行所述可执行指令以实现本公开实施例提供的任一风格图像生成方法,或者实现本公开实施例提供的任一风格图像生成模型的训练方法。In a fifth aspect, an embodiment of the present disclosure further provides an electronic device, the electronic device includes: a processing device; a memory for storing executable instructions of the processing device; the processing device for obtaining an instruction from the memory The executable instructions are read and executed to implement any style image generation method provided by the embodiments of the present disclosure, or to implement any style image generation model training methods provided by the embodiments of the present disclosure.
第六方面,本公开实施例还提供了一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序被处理装置执行时实现本公开实施例提供的任一风格图像生成方法,或者实现本公开实施例提供的任一风格图像生成模型的训练方法。In a sixth aspect, an embodiment of the present disclosure further provides a computer-readable storage medium, where the storage medium stores a computer program, and when the computer program is executed by a processing device, implements any style image generation method provided by the embodiment of the present disclosure , or implement the training method for any style image generation model provided by the embodiments of the present disclosure.
本公开实施例提供的技术方案与现有技术相比至少具有如下优点:Compared with the prior art, the technical solutions provided by the embodiments of the present disclosure have at least the following advantages:
在基于多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像训练得到初始风格图像生成模型后,基于至少一组裁剪参数,对初始风格图像生成模型进行裁剪,并继续对裁剪后的初始风格图像生成模型进行训练,得到风格图像实时生成模型,该风格图像实时生成模型的空间占用量、计算复杂度均小于初始风格图像生成模型,具有实时生成风格图像的功能,因此,在风格图像实时生成模型的应用阶段,可以在用户的终端设备上利用风格图像实时生成模型实时生成符合用户需求的风格图像。而且,所述风格图像实时生成模型可以随着所述裁剪参数的变化而变化,也即用户可以通过不同方式训练风格图像实时生成模型。如此,解决了现有模型的训练方式单一、以及不能在用户终 端设备上满足用户实时生成风格图像的需求的问题,实现了为用户实时生成风格图像的效果,提高了用户对图像风格转换功能的使用体验;并且,不同的风格图像实时生成模型可以兼容不同性能的终端设备,使得本公开实施例中风格图像生成方法可以广泛地应用在不同性能的终端设备上。After training the initial style image generation model based on the multiple original face sample images and the target style face sample image corresponding to each original face sample image, the initial style image generation model is cropped based on at least one set of cropping parameters , and continue to train the cropped initial style image generation model to obtain a style image real-time generation model. The space occupation and computational complexity of the style image real-time generation model are smaller than the initial style image generation model, and it has the ability to generate style images in real time. Therefore, in the application stage of the style image real-time generation model, the style image that meets the user's needs can be generated in real time by using the style image real-time generation model on the user's terminal device. Moreover, the style image real-time generation model can be changed with the change of the cropping parameters, that is, the user can train the style image real-time generation model in different ways. In this way, the problems that the existing model has a single training method and cannot meet the needs of users to generate style images in real time on the user terminal device are solved, the effect of generating style images for users in real time is realized, and the user's understanding of the image style conversion function is improved. In addition, different real-time generation models of style images can be compatible with terminal devices with different performances, so that the style image generation method in the embodiment of the present disclosure can be widely applied to terminal devices with different performances.
附图说明Description of drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本公开的实施例,并与说明书一起用于解释本公开的原理。The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description serve to explain the principles of the disclosure.
为了更清楚地说明本公开实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the accompanying drawings that are required to be used in the description of the embodiments or the prior art will be briefly introduced below. In other words, on the premise of no creative labor, other drawings can also be obtained from these drawings.
图1为本公开实施例提供的一种风格图像生成方法的流程图;FIG. 1 is a flowchart of a method for generating a style image according to an embodiment of the present disclosure;
图2为本公开实施例提供的另一种风格图像生成方法的流程图;FIG. 2 is a flowchart of another style image generation method provided by an embodiment of the present disclosure;
图3为本公开实施例提供的一种人脸区域包围框的在第一原始人脸图像上的位置示意图;3 is a schematic diagram of the position of a face area bounding box on a first original face image according to an embodiment of the present disclosure;
图4为本公开实施例提供的另一种风格图像生成方法的流程图;4 is a flowchart of another style image generation method provided by an embodiment of the present disclosure;
图5为本公开实施例提供的另一种风格图像生成方法的流程图;5 is a flowchart of another style image generation method provided by an embodiment of the present disclosure;
图6为本公开实施例提供的一种风格图像生成模型的训练方法的流程图;6 is a flowchart of a method for training a style image generation model according to an embodiment of the present disclosure;
图7为本公开实施例提供的另一种风格图像生成模型的训练方法的流程图;7 is a flowchart of another method for training a style image generation model according to an embodiment of the present disclosure;
图8为本公开实施例提供的另一种风格图像生成模型的训练方法的流程图;8 is a flowchart of another method for training a style image generation model according to an embodiment of the present disclosure;
图9为本公开实施例提供的另一种风格图像生成模型的训练方法的流程图;9 is a flowchart of another method for training a style image generation model according to an embodiment of the present disclosure;
图10为本公开实施例提供的一种嘴部素材的示意图;10 is a schematic diagram of a mouth material provided by an embodiment of the present disclosure;
图11为本公开实施例提供的一种风格图像生成装置的结构示意图;FIG. 11 is a schematic structural diagram of a style image generating apparatus according to an embodiment of the present disclosure;
图12为本公开实施例提供的一种风格图像生成模型的训练装置的结构示意图;FIG. 12 is a schematic structural diagram of a training device for a style image generation model according to an embodiment of the present disclosure;
图13为本公开实施例提供的一种电子设备的结构示意图。FIG. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.
具体实施方式Detailed ways
为了能够更清楚地理解本公开的上述目的、特征和优点,下面将对本公开的方案进行进一步描述。需要说明的是,在不冲突的情况下,本公开的实施例及实施例中的特征可以相互组合。In order to more clearly understand the above objects, features and advantages of the present disclosure, the solutions of the present disclosure will be further described below. It should be noted that the embodiments of the present disclosure and the features in the embodiments may be combined with each other under the condition of no conflict.
在下面的描述中阐述了很多具体细节以便于充分理解本公开,但本公开还可以采用其他不同于在此描述的方式来实施;显然,说明书中的实施例只是本公开的一部分实施例,而不是全部的实施例。Many specific details are set forth in the following description to facilitate a full understanding of the present disclosure, but the present disclosure can also be implemented in other ways different from those described herein; obviously, the embodiments in the specification are only a part of the embodiments of the present disclosure, and Not all examples.
图1为本公开实施例提供的一种风格图像生成方法的流程图,本公开实施例可以适用于基于原始人脸图像,生成任意风格的风格图像的情况。本公开实施例中提及的风格可以指图像效果,例如日漫风格、欧美漫画风格、油画风格、素描风格或者卡通风格等,具体可以根据图像处理领域中的图像风格分类而定。原始人脸图像可以指包括人脸区域的任意图像。该原始人脸图像可以是具有拍摄功能的设备拍摄所得的图像,也可以是通过绘制技 术绘制的图像。FIG. 1 is a flowchart of a method for generating a style image provided by an embodiment of the present disclosure. The embodiment of the present disclosure can be applied to a situation in which a style image of any style is generated based on an original face image. The styles mentioned in the embodiments of the present disclosure may refer to image effects, such as Japanese comic style, European and American comic style, oil painting style, sketch style, or cartoon style, etc., which may be determined according to the classification of image styles in the field of image processing. The original face image may refer to any image including a face region. The original face image may be an image captured by a device with a capturing function, or an image drawn by a drawing technology.
本公开实施例提供的风格图像生成方法可以由风格图像生成装置执行,该装置可以采用软件和/或硬件实现,并可集成在任意具有计算能力的电子设备上,例如终端等,该终端可以包括但不限于智能移动终端、平板电脑、个人电脑等。并且,风格图像生成装置可以采用独立的应用程序或者公众平台上集成的小程序的形式实现,还可以作为具有风格图像生成功能的应用程序或者小程序中集成的功能模块实现,该应用程序或者小程序可以包括但不限于视频交互类应用程序或者视频交互类小程序等。The style image generation method provided by the embodiments of the present disclosure may be executed by a style image generation apparatus, which may be implemented by software and/or hardware, and may be integrated on any electronic device with computing capabilities, such as a terminal, etc., the terminal may include But not limited to smart mobile terminals, tablet computers, personal computers, etc. In addition, the style image generating device can be implemented in the form of an independent application program or a small program integrated on the public platform, and can also be implemented as an application program with a style image generating function or a functional module integrated in the small program. The programs may include, but are not limited to, video interactive applications or video interactive applets.
如图1所示,本公开实施例提供的风格图像生成方法可以包括:As shown in FIG. 1 , the style image generation method provided by the embodiment of the present disclosure may include:
S101、获取原始人脸图像。S101. Obtain an original face image.
示例性的,当用户存在风格图像的生成需求时,可以上传存储在终端中的图像或者通过终端的图像拍摄装置实时拍摄图像或者视频。终端可以根据用户在终端中的图像选择操作、图像拍摄操作或图像上传操作,获取待处理的原始人脸图像。Exemplarily, when a user has a need for generating a style image, an image stored in the terminal may be uploaded or an image or video may be captured in real time by an image capturing device of the terminal. The terminal may acquire the original face image to be processed according to the user's image selection operation, image capture operation or image upload operation in the terminal.
S102、利用预先训练的目标风格图像实时生成模型,得到与原始人脸图像对应的目标风格人脸图像。S102 , using the pre-trained target style image to generate a model in real time to obtain a target style face image corresponding to the original face image.
其中,目标风格图像实时生成模型是在训练得到初始风格图像生成模型后,对基于所述初始风格图像生成模型按照至少一组裁剪参数进行至少一次裁剪操作得到的至少一个风格图像实时生成模型训练得到,并且初始风格图像生成模型和目标风格图像实时生成模型均基于多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像训练得到。其中,所述风格图像实时生成模型随着所述裁剪参数的变化而变化。The target style image real-time generation model is obtained by training at least one style image real-time generation model obtained by performing at least one cropping operation based on the initial style image generation model according to at least one set of cropping parameters after training to obtain the initial style image generation model. , and the initial style image generation model and the target style image real-time generation model are both trained based on multiple original face sample images and target style face sample images corresponding to each original face sample image. Wherein, the real-time generation model of the style image changes with the change of the cropping parameter.
可选的,在训练得到初始风格图像生成模型后,可以获取初始风格图像生成模型中的第一裁剪参数,基于该第一裁剪参数,基于初始风格图像生成模型进行至少一次裁剪操作。示例性的,以第一裁剪参数为激活层的第一重要因子为例,可以获取初始风格图像生成模型中激活层的第一重要因子,根据该第一重要因子对初始风格图像生成模型中的激活层以及与该激活层对应的卷积层进行裁剪,得到至少一个风格图像实时生成模型,然后继续对至少一个风格图像实时生成模型进行训练,得到训练后的目标风格图像实时生成模型。Optionally, after the initial style image generation model is obtained through training, the first cropping parameters in the initial style image generation model may be obtained, and based on the first crop parameters, at least one cropping operation is performed based on the initial style image generation model. Exemplarily, taking the first cropping parameter as the first important factor of the activation layer as an example, the first important factor of the activation layer in the initial style image generation model can be obtained, and the first important factor in the initial style image generation model can be determined according to the first important factor. The activation layer and the convolution layer corresponding to the activation layer are cropped to obtain at least one style image real-time generation model, and then continue to train the at least one style image real-time generation model to obtain the trained target style image real-time generation model.
进一步的,通过多次执行模型训练和裁剪操作,得到风格图像实时生成模型。示例性的,对初始风格图像生成模型进行至少两次裁剪操作,以得到第一风格图像实时生成模型和第二风格图像实时生成模型。例如,基于多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像训练得到初始风格图像生成模型,并获取初始风格图像生成模型中激活层的第一重要因子;根据第一重要因子,对初始风格图像生成模型中的激活层以及与该激活层对应的卷积层进行裁剪,得到第一风格图像实时生成模型;基于多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像,对第一风格图像实时生成模型进行训练,得到训练后的第一风格图像实时生成模型,并获取训练后的第一风格图像实时生成模型中激活层的第二重要因子;基于第二重要因子,对训练后的第一风格图像实时生成模型的激活层以及与该激活层对应的卷积层进行裁剪,得到第二风格图像实时生成模型;基于多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像,对第二风格图像实时生成模型进行训练,得到训练后的第二风格图像实 时生成模型。训练后的第一风格图像实时生成模型和训练后的第二风格图像实时生成模型,均可以作为目标风格图像实时生成模型,具有实时生成风格图像的功能。尽管上文以两次训练和剪裁为例进行说明,在具体应用中,可以灵活确定模型训练和裁剪的次数,不应当将上述示例理解为对本公开实施例的具体限定。Further, by performing model training and cropping operations multiple times, a real-time generation model of style images is obtained. Exemplarily, at least two cropping operations are performed on the initial style image generation model to obtain the first style image real-time generation model and the second style image real-time generation model. For example, an initial style image generation model is obtained by training based on multiple original face sample images and a target style face sample image corresponding to each original face sample image, and the first important factor of the activation layer in the initial style image generation model is obtained. ; According to the first important factor, the activation layer in the initial style image generation model and the convolution layer corresponding to the activation layer are cropped to obtain the real-time generation model of the first style image; target-style face sample images corresponding to the original face sample images, train the real-time generation model for the first style image, obtain the trained real-time generation model for the first style image, and obtain the trained real-time generation model for the first style image The second important factor of the middle activation layer; based on the second important factor, the activation layer of the trained first style image real-time generation model and the convolution layer corresponding to the activation layer are cropped to obtain the second style image real-time generation model. ; Based on a plurality of original face sample images and a target style face sample image corresponding to each original face sample image, the second style image real-time generation model is trained, and the trained second style image real-time generation model is obtained. Both the trained first style image real-time generation model and the trained second style image real-time generation model can be used as target style image real-time generation models, and have the function of real-time generation of style images. Although two times of training and trimming are used as an example for description, in specific applications, the number of times of model training and trimming can be flexibly determined, and the above examples should not be construed as a specific limitation to the embodiments of the present disclosure.
即在本公开实施例中,基于初始风格图像生成模型进行至少两次裁剪操作,以相应地得到至少两个风格图像实时生成模型,对至少两个风格图像实时生成模型进行训练得到至少两个目标风格图像实时生成模型,并且,至少两个目标风格图像实时生成模型分别对应不同的设备性能信息;相应的,在利用预先训练的风格图像实时生成模型,得到与原始人脸图像对应的目标风格人脸图像之前,还包括:基于当前设备性能信息,获取与当前设备性能信息相适配的目标风格图像实时生成模型。示例性的,服务器收到终端设备的模型获取请求或者模型下发请求后,可以根据模型获取请求或者模型下发请求中携带的终端设备的当前设备性能信息,将与该当前设备性能信息相适配的目标风格图像实时生成模型发送至终端设备。其中,终端设备的当前设备性能信息可以包括但不限于终端设备的存储空间使用信息、处理器运行指标等任意可以用于衡量终端设备当前运行性能的信息。That is, in the embodiment of the present disclosure, at least two cropping operations are performed based on the initial style image generation model to correspondingly obtain at least two style image real-time generation models, and at least two targets are obtained by training the at least two style image real-time generation models. The style image real-time generation model, and at least two target style image real-time generation models correspond to different equipment performance information respectively; correspondingly, using the pre-trained style image to generate the model in real time, the target style person corresponding to the original face image is obtained. Before the face image, the method further includes: based on the current device performance information, acquiring a real-time generation model of the target style image adapted to the current device performance information. Exemplarily, after the server receives the model acquisition request or the model issuing request from the terminal device, it can match the current device performance information according to the current device performance information of the terminal device carried in the model acquisition request or the model issuing request. The matched target style image is generated in real time and the model is sent to the terminal device. Wherein, the current device performance information of the terminal device may include, but is not limited to, storage space usage information of the terminal device, processor running indicators, and other information that can be used to measure the current running performance of the terminal device.
示例性的,终端设备当前存储空间量较大、处理器运行指标较大时,表明终端设备当前运行性能较佳,可以向终端设备发送初始风格图像生成模型,反之,可以向终端设备发送目标风格图像实时生成模型。Exemplarily, when the current storage space of the terminal device is large and the operating index of the processor is large, it indicates that the current running performance of the terminal device is relatively good, and the initial style image generation model can be sent to the terminal device; otherwise, the target style can be sent to the terminal device. Images generate models in real time.
其中,初始风格图像生成模型或目标风格图像实时生成模型可以包括条件生成对抗网络(CGAN,Conditional Generative Adversarial Networks)模型、循环一致性生成对抗网络(Cyclegan,Cycle Consistent Adversarial Networks)模型等任意的支持非对齐训练的网络模型,本公开实施例不作具体限定。得到目标风格人脸图像之后,可以在终端设备上进行展示,以供用户查看。Among them, the initial style image generation model or the target style image real-time generation model may include a conditional generative adversarial network (CGAN, Conditional Generative Adversarial Networks) model, a cycle-consistent generative adversarial network (Cyclegan, Cycle Consistent Adversarial Networks) model and other arbitrary support non- The network model for alignment training is not specifically limited in this embodiment of the present disclosure. After the target style face image is obtained, it can be displayed on the terminal device for the user to view.
根据本公开实施例的技术方案,在基于多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像训练得到初始风格图像生成模型后,基于初始风格图像生成模型的第一裁剪参数,对初始风格图像生成模型进行裁剪,并继续对裁剪后的初始风格图像生成模型进行训练,得到目标风格图像实时生成模型,该目标风格图像实时生成模型的空间占用量、计算复杂度均小于初始风格图像生成模型,具有实时生成风格图像的功能,因此,在目标风格图像实时生成模型的应用阶段,可以利用目标风格图像实时生成模型实时生成符合用户需求的风格图像。本公开实施例解决了现有模型的训练方式单一、以及不能满足用户实时生成风格图像的需求的问题,实现了为用户实时生成风格图像的效果,提高了用户对图像风格转换功能的使用体验;并且,不同的目标风格图像实时生成模型可以兼容不同性能的终端设备,使得本公开实施例中风格图像生成方法可以广泛地应用在不同性能的终端设备上。According to the technical solutions of the embodiments of the present disclosure, after training an initial style image generation model based on a plurality of original face sample images and a target style face sample image corresponding to each original face sample image, the initial style image generation model is generated based on the initial style image. The first cropping parameter of the first cropping parameter, crop the initial style image generation model, and continue to train the cropped initial style image generation model to obtain the target style image real-time generation model. The complexity is smaller than the initial style image generation model, and it has the function of real-time generation of style images. Therefore, in the application stage of the target style image real-time generation model, the target style image real-time generation model can be used to generate real-time style images that meet user needs. The embodiment of the present disclosure solves the problems that the existing model has a single training method and cannot meet the needs of users to generate style images in real time, realizes the effect of real-time generation of style images for users, and improves the user experience of using the image style conversion function; Moreover, different target style image real-time generation models can be compatible with terminal devices with different performances, so that the style image generation method in the embodiment of the present disclosure can be widely applied to terminal devices with different performances.
在上述技术方案的基础上,可选的,多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像分别为预先训练的目标图像模型的输入和输出。其中,目标图像模型具有生成风格图像的功能,用于在训练得到初始风格图像生成模型和目标风格图像实时生成模型的过程中,生成风格图像样本,使得用于训练得到初始风格图像生成模 型和目标风格图像实时生成模型的样本数据具有一致性,降低目标风格图像实时生成模型的训练难度。目标图像模型可以包括条件生成对抗网络CGAN模型、循环一致性生成对抗网络Cyclegan模型等任意的支持非对齐训练的网络模型,本公开实施例不作具体限定。Based on the above technical solution, optionally, multiple original face sample images and a target-style face sample image corresponding to each original face sample image are the input and output of the pre-trained target image model, respectively. The target image model has the function of generating style images, and is used to generate style image samples in the process of obtaining the initial style image generation model and the target style image real-time generation model through training, so as to be used for training to obtain the initial style image generation model and the target style image. The sample data of the style image real-time generation model is consistent, which reduces the training difficulty of the target style image real-time generation model. The target image model may include any network model that supports non-aligned training, such as a conditional generative adversarial network CGAN model, a cycle-consistent generative adversarial network Cyclegan model, etc., which is not specifically limited in the embodiment of the present disclosure.
图2为本公开实施例提供的另一种风格图像生成方法的流程图,基于上述技术方案进一步优化与扩展,并可以与上述各个可选实施方式进行结合。如图2所示,该风格图像生成方法可以包括:FIG. 2 is a flowchart of another style image generation method provided by an embodiment of the present disclosure, which is further optimized and expanded based on the above-mentioned technical solution, and can be combined with each of the above-mentioned optional embodiments. As shown in Figure 2, the style image generation method may include:
S201、获取原始人脸图像。S201. Obtain an original face image.
S202、识别原始人脸图像的人脸区域,并确定人脸区域包围框的参数信息,以及人脸区域的旋转角度。S202. Identify the face region of the original face image, and determine parameter information of the bounding box of the face region and the rotation angle of the face region.
在获取原始人脸图像之后,可以利用任意可用的人脸识别技术,识别原始人脸图像的人脸区域,并输出原始人脸图像上包围人脸区域的包围框的参数信息,即人脸区域包围框的参数信息。同时,利用关键点检测技术,确定人脸区域的关键点,进而基于该关键点确定人脸区域的旋转角度。After acquiring the original face image, any available face recognition technology can be used to identify the face area of the original face image, and output the parameter information of the bounding box surrounding the face area on the original face image, that is, the face area Parameter information for the bounding box. At the same time, the key point detection technology is used to determine the key point of the face area, and then the rotation angle of the face area is determined based on the key point.
其中,人脸区域包围框的参数信息包括包围框在原始人脸图像上的位置。进一步地,人脸区域包围框的参数信息还可以包括人脸区域包围框的尺寸、形状。人脸区域包围框的尺寸可以根据采用的人脸识别技术中设置的参数而定,也可以自定义设置。人脸区域包围框可以是任意的规则几何图形;人脸区域的旋转角度是指为得到满足预设人脸位置需求的图像时,人脸区域在原始人脸图像上应该旋转的角度。The parameter information of the bounding box of the face region includes the position of the bounding box on the original face image. Further, the parameter information of the face region bounding box may also include the size and shape of the face region bounding box. The size of the bounding box of the face area can be determined according to the parameters set in the adopted face recognition technology, or can be customized. The face area bounding box can be any regular geometric figure; the rotation angle of the face area refers to the angle by which the face area should be rotated on the original face image in order to obtain an image that meets the preset face position requirements.
通过在识别人脸区域的同时,利用关键点检测技术获取人脸区域的旋转角度,直接用于人脸对齐的调整中,可以省去通过最小二乘法或者奇异值分解(SVD)法确定人脸区域位置调整的仿射变换矩阵的复杂操作,可以提高人脸位置调整的效率,进而可以实现实时的人脸位置调整。By using the key point detection technology to obtain the rotation angle of the face area while recognizing the face area, it can be directly used in the adjustment of the face alignment, which can save the need to determine the face by the least square method or the singular value decomposition (SVD) method. The complex operation of the affine transformation matrix for regional position adjustment can improve the efficiency of face position adjustment, and then realize real-time face position adjustment.
S203、基于人脸区域包围框的参数信息,以及人脸区域的旋转角度对人脸区域进行位置调整,获得第一人脸图像。S203 , adjusting the position of the face region based on the parameter information of the bounding box of the face region and the rotation angle of the face region to obtain a first face image.
其中,人脸区域包围框的参数信息可以包括包围框在原始人脸图像上的位置。包围框在原始人脸图像上的位置可以通过人脸区域包围框在原始人脸图像上各个顶点的位置坐标,或者各条边在原始人脸图像上距离图像边界的距离等进行表征。示例性的,可以参照现有的仿射变换原理,基于人脸区域包围框的参数信息以及人脸区域的旋转角度,构建人脸区域位置调整的仿射变换矩阵,对原始人脸图像上的人脸区域进行位置调整,得到满足预设人脸位置需求的图像,即第一人脸图像。The parameter information of the bounding box of the face region may include the position of the bounding box on the original face image. The position of the bounding box on the original face image can be represented by the position coordinates of each vertex of the face region bounding box on the original face image, or the distance of each edge from the image boundary on the original face image. Exemplarily, an affine transformation matrix for adjusting the position of the face region can be constructed based on the parameter information of the bounding box of the face region and the rotation angle of the face region with reference to the existing affine transformation principle. The position of the face area is adjusted to obtain an image that meets the preset face position requirements, that is, the first face image.
需要说明的是,在本公开实施例中,预设人脸位置需求可以是:人脸区域位置调整后,人脸区域位于整张图像的中心区域;或者,人脸区域位置调整后,人脸区域的五官区域处于整张图像的特定位置;或者,人脸区域位置调整后,人脸区域和背景区域(整张图像上除去人脸区域后的剩余图像区域)在整张图像上的区域占比满足占比需求,通过该占比需求的设置,可以避免人脸区域在整体图像上的区域占比过大或者过小的现象,达到人脸区域和背景区域的显示均衡性。It should be noted that, in this embodiment of the present disclosure, the preset face position requirement may be: after the face region position is adjusted, the face region is located in the central region of the entire image; or, after the face region position is adjusted, the face region The facial features of the region are located at a specific position in the entire image; or, after the face region position is adjusted, the face region and the background region (the remaining image region after the face region is removed from the entire image) occupy the entire image. By setting the ratio requirement, it can avoid the phenomenon that the area of the face area occupies too large or too small in the overall image, and achieve the display balance between the face area and the background area.
人脸区域的位置调整操作可以包括但不限于旋转、平移、缩小、放大和裁剪等。根据 人脸区域包围框在原始人脸图像上的实际位置和预设人脸位置需求,可以灵活选择至少一种位置调整操作对人脸区域进行位置调整,直至得到满足预设人脸位置需求的人脸图像。在对原始人脸图像上的人脸区域进行位置调整过程中,可以将原始人脸图像作为一个整体进行位置调整,也可以采用抠图技术,将包括人脸区域包围框或者包括人脸区域的子区域进行抠图处理,从而单独对该人脸区域包围框或子区域进行位置调整,本公开实施例不作具体限定。The position adjustment operation of the face region may include, but is not limited to, rotation, translation, reduction, enlargement, and cropping. According to the actual position of the face area bounding box on the original face image and the preset face position requirements, at least one position adjustment operation can be flexibly selected to adjust the position of the face area until the preset face position requirements are obtained. face image. In the process of adjusting the position of the face area on the original face image, the position of the original face image can be adjusted as a whole, or the matting technique can be used to enclose the frame including the face area or the face area including the face area. The sub-region is subjected to matting processing, so that the position of the bounding box or sub-region of the face region is adjusted independently, which is not specifically limited in the embodiment of the present disclosure.
在本公开实施例中,通过在生成与原始人脸图像对应的目标风格人脸图像之前,对人脸区域在原始人脸图像上的位置进行调整,实现对原始人脸图像的规范化预处理,可以确保后续风格图像的生成效果。In the embodiment of the present disclosure, before generating the target-style face image corresponding to the original face image, by adjusting the position of the face region on the original face image, the normalization preprocessing of the original face image is realized, It can ensure the generation effect of subsequent style images.
S204、利用预先训练的目标风格图像实时生成模型,得到与第一人脸图像对应的目标风格人脸图像。S204 , using the pre-trained target style image to generate a model in real time to obtain a target style face image corresponding to the first face image.
在得到目标风格图像之后,还可以进一步根据风格图像处理需求,例如图像背景融合需求、人脸位置恢复需求等,对目标风格图像进行灵活处理。After the target style image is obtained, the target style image can be further processed flexibly according to the style image processing requirements, such as image background fusion requirements, face position recovery requirements, etc.
可选的,本公开实施例提供的风格图像生成方法还包括:Optionally, the style image generation method provided by the embodiment of the present disclosure further includes:
获取目标风格人脸图像中的目标人脸区域;Obtain the target face area in the target style face image;
对目标风格人脸图像中的目标人脸区域进行位置调整,得到与原始人脸图像中人脸区域位置对应的第一风格人脸图像,即将目标风格人脸图像中的目标人脸区域位置恢复到与原始人脸图像中人脸区域位置一致的位置,从而减少目标风格人脸图像与原始人脸图像上人脸区域位置的差异性。示例性的,如果第一人脸图像是基于构建的仿射变换矩阵M对对原始人脸图像上的人脸区域进行位置调整得到,则可以获取仿射变换矩阵M的逆矩阵M′,并利用该仿射变换矩阵M的逆矩阵M′,对目标风格人脸图像中的目标人脸区域进行位置调整,得到第一风格人脸图像。The position of the target face area in the target style face image is adjusted to obtain the first style face image corresponding to the position of the face area in the original face image, that is, the position of the target face area in the target style face image is restored. to a position consistent with the position of the face region in the original face image, thereby reducing the difference between the target style face image and the face region position on the original face image. Exemplarily, if the first face image is obtained by adjusting the position of the face region on the original face image based on the constructed affine transformation matrix M, the inverse matrix M′ of the affine transformation matrix M can be obtained, and Using the inverse matrix M′ of the affine transformation matrix M, the position of the target face region in the target style face image is adjusted to obtain the first style face image.
进一步的,本公开实施例提供的风格图像生成方法还包括:Further, the style image generation method provided by the embodiment of the present disclosure further includes:
将第一风格人脸图像中的目标人脸区域与目标背景区域进行融合处理,以得到第二风格人脸图像。其中,目标背景区域(即除去人脸区域外的剩余图像区域)可以是原始人脸图像的背景区域,或者利用背景处理算法处理过的背景区域,例如可以是目标风格人脸图像上的背景区域等,在确保为用户提供具有较高展示效果的风格图像的基础上,本公开实施例不作具体限定。通过与目标背景区域进行融合,可以优化最终的风格图像的显示效果。The target face area in the first style face image is fused with the target background area to obtain the second style face image. The target background area (that is, the remaining image area except the face area) may be the background area of the original face image, or the background area processed by the background processing algorithm, such as the background area on the target-style face image. etc., on the basis of ensuring that a style image with a higher display effect is provided for the user, the embodiment of the present disclosure does not make a specific limitation. By blending with the target background area, the display effect of the final style image can be optimized.
具体的,可以采用任意可用的图像融合技术,将第一风格人脸图像中的目标人脸区域与目标背景区域进行融合处理。以将第一风格人脸图像中的目标人脸区域与原始人脸图像的背景区域进行融合处理,得到第二风格人脸图像为例,可以实现除了对图像风格进行改变外,第二风格人脸图像上的其他图像特征或者图像细节仍与原始人脸图像保持一致,最终,可以将第二风格人脸图像展示给用户。Specifically, any available image fusion technology may be used to perform fusion processing on the target face region and the target background region in the face image of the first style. Taking the target face area in the first style face image and the background area of the original face image to obtain the second style face image as an example, in addition to changing the image style, the second style face image can be realized. Other image features or image details on the face image are still consistent with the original face image, and finally, the second style face image can be displayed to the user.
根据本公开实施例的技术方案,通过在风格图像生成过程中,对待处理的原始人脸图像进行人脸区域的位置调整,然后利用预先训练的目标风格图像实时生成模型实时得到对应的目标风格人脸图像,提高了风格图像的生成效果,解决了现有方案中图像风格转换后图像效果不佳的问题;并且,在本公开实施例中,在识别人脸区域的同时,可以获取人脸 区域的旋转角度,直接用于人脸位置的调整(或称为人脸对齐)中,提高了人脸位置调整的效率,进而可以实现实时的人脸位置调整。According to the technical solutions of the embodiments of the present disclosure, in the process of generating the style image, the position of the face region is adjusted on the original face image to be processed, and then the real-time generation model of the pre-trained target style image is used to obtain the corresponding target style person in real time. The face image improves the generation effect of the style image, and solves the problem of poor image effect after image style conversion in the existing solution; and, in the embodiment of the present disclosure, the face area can be obtained while the face area is recognized. The rotation angle is directly used in the adjustment of face position (or called face alignment), which improves the efficiency of face position adjustment, and then can realize real-time face position adjustment.
在上述技术方案的基础上,可选的,基于人脸区域包围框的参数信息,以及人脸区域的旋转角度对人脸区域进行位置调整,获得第一人脸图像,包括:On the basis of the above technical solution, optionally, the position of the face region is adjusted based on the parameter information of the bounding box of the face region and the rotation angle of the face region to obtain the first face image, including:
获取预先设置的人脸位置矫正参数值和预设图像尺寸;Obtain the preset face position correction parameter value and preset image size;
基于人脸区域包围框的参数信息、人脸区域的旋转角度、人脸位置矫正参数值和预设图像尺寸,对人脸区域进行位置调整,获得第一人脸图像。示例性,可以基于获取的各个参数,构建仿射变换矩阵,然后基于仿射变换矩阵对人脸区域进行位置调整。Based on the parameter information of the bounding box of the face region, the rotation angle of the face region, the face position correction parameter value and the preset image size, the position of the face region is adjusted to obtain the first face image. Exemplarily, an affine transformation matrix may be constructed based on the acquired parameters, and then the position of the face region may be adjusted based on the affine transformation matrix.
其中,人脸位置矫正参数值用于矫正人脸区域在位置调整后的图像上的位置,可以包括人脸上下位置矫正或人脸左右位置矫正,提高人脸区域在原始人脸图像上的实际位置的确定精度,进而确保人脸区域位置调整的准确性。例如,基于人脸区域包围框的参数信息确定的人脸区域在原始人脸图像上沿竖直方向的位置比实际位置偏上,则可以利用预先设置人脸位置矫正参数值,准确确定人脸区域的实际位置。预设图像尺寸是指预先确定的输入风格图像生成模型的图像尺寸。即如果原始人脸图像不满足预设图像尺寸,还需要对原始人脸图像进行图像裁剪。Among them, the face position correction parameter value is used to correct the position of the face region on the position-adjusted image, which may include the correction of the lower and lower positions of the face or the correction of the left and right positions of the face, so as to improve the actual position of the face region on the original face image. The accuracy of the position determination, thereby ensuring the accuracy of the face region position adjustment. For example, if the position of the face region in the vertical direction on the original face image determined based on the parameter information of the bounding box of the face region is higher than the actual position, the preset face position correction parameter value can be used to accurately determine the face region. The actual location of the area. The preset image size refers to the predetermined image size of the input style image generation model. That is, if the original face image does not meet the preset image size, it is also necessary to perform image cropping on the original face image.
示例行的,假设利用关键点检测技术确定的人脸区域的旋转角度可以表示为Roll,人脸位置矫正参数值可以表示为ymeanScale,ymeanScale的取值范围可以设置为[0,1],预设图像尺寸可以表示为targetSize,人脸区域包围框的参数信息包括该包围框的各条边距离原始人脸图像的边界的距离,以图3为例,假设以原始人脸图像的左下角作为图像坐标系的原点,经过左下角的图像边界分别为x轴和y轴,则人脸区域包围框水平方向上的两条边距离x轴的距离可以表示为第一距离b和第二距离t,人脸区域包围框竖直方向上的两条边距离y轴的距离可以表示为第三距离l和第四距离r。基于上述假设,As an example, it is assumed that the rotation angle of the face region determined by the key point detection technology can be expressed as Roll, the value of the face position correction parameter can be expressed as ymeanScale, and the value range of ymeanScale can be set to [0, 1], the preset The image size can be expressed as targetSize, and the parameter information of the bounding box of the face area includes the distance between each edge of the bounding box and the boundary of the original face image. Taking Figure 3 as an example, it is assumed that the lower left corner of the original face image is used as the image. The origin of the coordinate system, the image boundary passing through the lower left corner is the x-axis and the y-axis respectively, then the distance between the two sides of the face area bounding box in the horizontal direction from the x-axis can be expressed as the first distance b and the second distance t, The distances between the two sides of the face region bounding box in the vertical direction from the y-axis can be expressed as a third distance l and a fourth distance r. Based on the above assumptions,
人脸区域中心的横坐标值:The abscissa value of the center of the face area:
xMean=(l+r)/2;xMean=(l+r)/2;
人脸区域中心的纵坐标值:The ordinate value of the center of the face area:
yMean=ymeanScale·t+(1-ymeanScale)·b;进而依据仿射变换原理,用于调整人脸区域位置的仿射变换矩阵可以表示为一个2x3的矩阵M,如下所示:yMean=ymeanScale·t+(1-ymeanScale)·b; and then according to the principle of affine transformation, the affine transformation matrix used to adjust the position of the face region can be expressed as a 2x3 matrix M, as shown below:
Figure PCTCN2021113225-appb-000001
Figure PCTCN2021113225-appb-000001
图4为本公开实施例提供的另一种风格图像生成方法的流程图,基于上述技术方案进一步优化与扩展,并可以与上述各个可选实施方式进行结合。并且,图4和图2中存在相同的操作,以下不再赘述,可以参考上述实施例的描述。如图4所示,该风格图像生成方法可以包括:FIG. 4 is a flowchart of another style image generation method provided by an embodiment of the present disclosure, which is further optimized and expanded based on the foregoing technical solution, and may be combined with the foregoing optional implementation manners. In addition, the same operations exist in FIG. 4 and FIG. 2 , which will not be repeated below, and reference may be made to the descriptions of the foregoing embodiments. As shown in Figure 4, the style image generation method may include:
S301、获取原始人脸图像。S301. Obtain an original face image.
S302、识别原始人脸图像的人脸区域,并确定人脸区域包围框的参数信息,以及人脸区域的旋转角度。S302. Identify the face area of the original face image, and determine the parameter information of the bounding box of the face area and the rotation angle of the face area.
其中,人脸区域包围框的四个边与原始人脸图像的四个边平行,人脸区域包围框的参 数信息包括四个边在原始人脸图像中的位置参数;人脸区域包围框可以包括任意的规则几何图形,例如可以是正方形。通过采用规则图形作为人脸区域包围框,可以简化人脸区域包围框在原始人脸图像上的位置表示。Among them, the four sides of the face area bounding box are parallel to the four sides of the original face image, and the parameter information of the face area bounding box includes the position parameters of the four sides in the original face image; the face area bounding box can be This includes any regular geometric figure, which can be a square, for example. By using a regular graph as the face region bounding box, the position representation of the face region bounding box on the original face image can be simplified.
S303、获取预先设置的人脸位置矫正参数值和预设图像尺寸。S303: Acquire a preset face position correction parameter value and a preset image size.
其中,人脸位置矫正参数值用于矫正人脸区域在位置调整后的图像上的位置。The value of the face position correction parameter is used to correct the position of the face region on the position-adjusted image.
S304、基于人脸区域包围框的四个边对应的处于水平方向上的位置参数,计算人脸区域中心的横坐标值。S304: Calculate the abscissa value of the center of the face region based on the position parameters in the horizontal direction corresponding to the four sides of the bounding box of the face region.
S305、基于人脸区域包围框的四个边对应的处于竖直方向上的位置参数和人脸位置矫正参数值,计算人脸区域中心的纵坐标值。S305: Calculate the ordinate value of the center of the face region based on the position parameters in the vertical direction corresponding to the four sides of the bounding box of the face region and the value of the face position correction parameter.
仍以图3为例,人脸区域包围框的四个边对应的处于水平方向上的位置参数可以包括第三距离l和第四距离r,正方形的四个边对应的处于竖直方向上的位置参数可以包括第一距离b和第二距离t,则人脸区域中心的横坐标值:xMean=(l+r)/2;Still taking FIG. 3 as an example, the position parameters in the horizontal direction corresponding to the four sides of the bounding box of the face area may include the third distance 1 and the fourth distance r, and the four sides of the square corresponding to the position parameters in the vertical direction. The position parameter may include a first distance b and a second distance t, then the abscissa value of the center of the face area: xMean=(l+r)/2;
人脸区域中心的纵坐标值:The ordinate value of the center of the face area:
yMean=ymeanScale·t+(1-ymeanScale)·b。yMean=ymeanScale·t+(1−ymeanScale)·b.
S306、获取预先设置的人脸裁剪占比。S306. Obtain a preset proportion of face cropping.
人脸裁剪占比edgeScale用于表示在原始人脸图像上对人脸区域包围框的裁剪倍数,例如,人脸裁剪占比取值为2,即表示在原始人脸图像上,按照人脸区域包围框尺寸的2倍,对包括人脸区域的图像区域进行裁剪。The face cropping ratio edgeScale is used to indicate the cropping multiple of the bounding box of the face area on the original face image. For example, the face cropping ratio is 2, which means that on the original face image, according to the face area 2 times the size of the bounding box, crop the area of the image that includes the face area.
S307、基于人脸裁剪占比和人脸区域包围框的边长值,计算人脸区域的边长值。S307. Calculate the side length value of the face area based on the face cropping ratio and the side length value of the bounding box of the face area.
仍以图3为例,人脸区域包围框的边长值可以表示为第三距离l和第四距离r之间的差值(r-l),或者第一距离b和第二距离t之间的差值(t-b)。人脸区域的边长值edgeLength可以表示为:Still taking FIG. 3 as an example, the value of the side length of the bounding box of the face area can be expressed as the difference (r-l) between the third distance l and the fourth distance r, or the difference between the first distance b and the second distance t. Difference (t-b). The edge length value edgeLength of the face area can be expressed as:
edgeLength=edgeScale·(r-l)。edgeLength=edgeScale·(r-l).
S308、基于人脸区域的边长值和预设图像尺寸,计算缩放尺寸值s。S308. Calculate the scaling size value s based on the side length value of the face region and the preset image size.
缩放尺寸值s可以表示为预设图像尺寸和人脸区域的边长值之间的比值,具体表示为s=targetSize/edgeLength。The scaling size value s may be expressed as the ratio between the preset image size and the edge length value of the face region, specifically expressed as s=targetSize/edgeLength.
S309、基于人脸区域中心的横坐标值、人脸区域中心的纵坐标值、人脸区域的旋转角度、预设图像尺寸和缩放尺寸值,构建仿射变换矩阵。S309 , construct an affine transformation matrix based on the abscissa value of the center of the face area, the ordinate value of the center of the face area, the rotation angle of the face area, the preset image size and the scaling size value.
基于上述参数表示方式,仿射变换矩阵M可以表示如下:Based on the above parameter representation, the affine transformation matrix M can be expressed as follows:
Figure PCTCN2021113225-appb-000002
Figure PCTCN2021113225-appb-000002
S310、基于仿射变换矩阵,对人脸区域进行位置调整,获得第一人脸图像。S310. Based on the affine transformation matrix, adjust the position of the face region to obtain a first face image.
S311、利用预先训练的目标风格图像实时生成模型,得到与第一人脸图像对应的目标 风格人脸图像。S311. Use the pre-trained target style image to generate a model in real time to obtain a target style face image corresponding to the first face image.
根据本公开实施例的技术方案,通过在风格图像生成过程中,根据针对原始人脸图像的裁剪、缩放等需求,构建人脸区域位置调整所需的仿射变换矩阵,从而根据仿射变换矩阵对人脸区域在原始人脸图像上的位置进行调整,确保了人脸区域位置调整的准确性,然后利用预先训练的目标风格图像实时生成模型实时得到对应的目标风格人脸图像,提高了风格图像的生成效果,解决了现有方案中图像风格转换后图像效果不佳的问题。According to the technical solutions of the embodiments of the present disclosure, in the process of generating the style image, the affine transformation matrix required for adjusting the position of the face region is constructed according to the requirements of cropping, scaling, etc. for the original face image, so that the affine transformation matrix Adjust the position of the face area on the original face image to ensure the accuracy of the adjustment of the face area, and then use the pre-trained target style image to generate the model in real time to obtain the corresponding target style face image in real time, which improves the style The image generation effect solves the problem of poor image effect after image style conversion in the existing scheme.
在上述技术方案的基础上,可选的,利用预先训练的目标风格图像实时生成模型,得到与第一人脸图像对应的目标风格人脸图像,包括:On the basis of the above technical solutions, optionally, a pre-trained target style image is used to generate a model in real time to obtain a target style face image corresponding to the first face image, including:
根据预设伽马值对第一人脸图像的像素值进行校正,得到第二人脸图像;Correct the pixel value of the first face image according to the preset gamma value to obtain the second face image;
对第二人脸图像进行亮度归一化处理,得到亮度调整后的第三人脸图像;例如,可以确定第二人脸图像上的最大像素值,然后将第二人脸图像上的所有像素值对当前确定的最大像素值进行归一化;Perform brightness normalization on the second face image to obtain a third face image after brightness adjustment; for example, the maximum pixel value on the second face image can be determined, and then all pixels on the second face image can be The value normalizes the currently determined maximum pixel value;
利用预先训练的目标风格图像实时生成模型,得到与第三人脸图像对应的目标风格人脸图像。Using the pre-trained target style image to generate the model in real time, the target style face image corresponding to the third face image is obtained.
其中,伽马校正还可以被称为伽马非线性化或伽马编码,是用来针对影片或是影像***里对于光线的辉度或是三色刺激值所进行非线性的运算或反运算。为图像进行伽马矫正,可以对人类视觉的特性进行补偿,从而根据人类对光线或者黑白的感知,最大化地利用表示黑白的数据位或带宽。其中,预设伽马值可以预先设置,本公开实施例不作具体限定,例如将第一人脸图像上的RGB三个通道的像素值同时进行伽马值为1/1.5的矫正。伽马校正的具体实现可以参照现有技术原理实现。Among them, gamma correction can also be called gamma nonlinearization or gamma coding, which is used to perform nonlinear operations or inverse operations on the luminance or tristimulus values of light in a film or imaging system. . Gamma-correcting images can compensate for the characteristics of human vision, thereby maximizing the use of data bits or bandwidth representing black and white based on human perception of light or black and white. The preset gamma value may be preset, which is not specifically limited in the embodiment of the present disclosure. For example, the pixel values of the three RGB channels on the first face image are simultaneously corrected with a gamma value of 1/1.5. The specific implementation of gamma correction can be implemented with reference to the principles of the prior art.
通过伽马校正和亮度归一化处理,可以得到亮度分布更加均衡的第二人脸图像,减少脸部瑕疵,避免图像亮度分布不均衡导致生成的风格图像效果不理想的现象,确保得到的目标风格图像的展示效果更加稳定。Through gamma correction and brightness normalization processing, a second face image with a more balanced brightness distribution can be obtained, which can reduce facial defects, avoid the phenomenon of unbalanced image brightness distribution leading to unsatisfactory effect of the generated style image, and ensure the obtained target image. The presentation of style images is more stable.
图5为本公开实施例提供的另一种风格图像生成方法的流程图,用于对本公开实施例进行示例性说明,如图5所示,首先获取用户图像,可以利用抠图处理技术提取用户图像上的人脸区域,然后基于采用上述实施例中的仿射变换矩阵确定方式,确定用于调整用户图像上人脸区域位置的仿射变换矩阵,利用仿射变换矩阵对人脸区域进行位置调整(即指图5中的人脸对齐处理),然后再利用预先训练的目标风格图像实时生成模型,生成与原始人脸图像对应的目标风格人脸图像;最后,可以利用仿射变换矩阵的逆变换矩阵,对目标风格图像上的人脸区域位置进行调整,将人脸区域位置进行恢复,并将位置恢复后的人脸区域与用户图像上的背景区域进行融合处理,最终可以将背景融合后的风格图像反馈给用户。FIG. 5 is a flowchart of another style image generation method provided by an embodiment of the present disclosure, which is used to exemplarily illustrate an embodiment of the present disclosure. As shown in FIG. 5 , a user image is obtained first, and a matting processing technology can be used to extract the user The face area on the image, and then based on the affine transformation matrix determination method in the above-mentioned embodiment, determine the affine transformation matrix used to adjust the position of the face area on the user image, and use the affine transformation matrix to position the face area. Adjustment (that is, the face alignment processing in Figure 5), and then use the pre-trained target style image to generate the model in real time to generate the target style face image corresponding to the original face image; Inverse transformation matrix, adjust the position of the face area on the target style image, restore the position of the face area, and fuse the restored face area with the background area on the user image, and finally the background can be fused After the style image is fed back to the user.
图6为本公开实施例提供的一种风格图像生成模型的训练方法的流程图,本公开实施例可以适用于如何训练得到满足风格转换需求的风格图像生成模型的情况,该风格图像生成模型用于生成与原始人脸图像对应的风格图像。本公开实施例中提及的图像风格可以指图像效果,例如日漫风格、欧美漫画风格、油画风格、素描风格、或者卡通风格等,具体可以根据图像处理领域中的图像风格分类而定。本公开实施例提供的风格图像生成模型的 训练装置可以采用软件和/或硬件实现,并可集成在任意具有计算能力的电子设备上,例如服务器等。FIG. 6 is a flowchart of a training method for a style image generation model provided by an embodiment of the present disclosure. The embodiment of the present disclosure can be applied to the situation of how to train a style image generation model that meets the style conversion requirements. The style image generation model uses It is used to generate style images corresponding to the original face images. The image style mentioned in the embodiments of the present disclosure may refer to image effects, such as Japanese comic style, European and American comic style, oil painting style, sketch style, or cartoon style, etc., which may be determined according to the classification of image styles in the field of image processing. The training apparatus for the style image generation model provided by the embodiments of the present disclosure may be implemented by software and/or hardware, and may be integrated on any electronic device with computing capabilities, such as a server, etc.
如图6所示,本公开实施例提供的风格图像生成模型的训练方法可以包括:As shown in FIG. 6 , the training method of the style image generation model provided by the embodiment of the present disclosure may include:
S601、获取多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像。S601. Acquire a plurality of original face sample images and a target-style face sample image corresponding to each original face sample image.
关于模型训练过程中的样本图像,可以利用开放的图像数据库获取。采用多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像用于本公开实施例的模型训练中,可以确样本数据的一致性,进而为实现较高的模型训练效果奠定基础。The sample images in the model training process can be obtained from an open image database. Using a plurality of original face sample images and a target-style face sample image corresponding to each original face sample image in the model training of the embodiment of the present disclosure can ensure the consistency of the sample data, thereby achieving a higher level of performance. The model training effect lays the foundation.
可选的,获取多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像,包括:Optionally, obtain multiple original face sample images and target-style face sample images corresponding to each original face sample image, including:
获取多个原始人脸样本图像;Obtain multiple original face sample images;
利用预先训练的目标图像模型,分别获取与每个原始人脸样本图像对应的目标风格人脸样本图像。Using the pre-trained target image model, the target-style face sample images corresponding to each original face sample image are obtained respectively.
其中,目标图像模型具有生成风格图像的功能,用于在训练得到初始风格图像生成模型和风格图像实时生成模型的过程中,生成风格图像样本,使得后续用于训练得到初始风格图像生成模型和风格图像实时生成模型的样本数据具有一致性,降低风格图像实时生成模型的训练难度。目标图像模型可以包括条件生成对抗网络CGAN模型、循环一致性生成对抗网络Cyclegan模型等任意的支持非对齐训练的网络模型,本公开实施例不作具体限定。Among them, the target image model has the function of generating style images, and is used to generate style image samples in the process of obtaining the initial style image generation model and the style image real-time generation model through training, so as to be used for subsequent training to obtain the initial style image generation model and style The sample data of the image real-time generation model is consistent, which reduces the training difficulty of the style image real-time generation model. The target image model may include any network model that supports non-aligned training, such as a conditional generative adversarial network CGAN model, a cycle-consistent generative adversarial network Cyclegan model, etc., which is not specifically limited in the embodiment of the present disclosure.
进一步的,目标图像模型基于利用图像生成模型得到的风格人脸样本图像训练得到。其中,图像生成模型可以包括生成对抗网络(GAN,Generative Adversarial Networks)模型,具体实现原理可以参考现有技术。示例性的,目标图像模型的训练过程可以包括:获取多个标准风格人脸样本图像,并基于多个标准风格人脸样本图像,训练得到的标准图像生成模型;利用标准图像生成模型生成多个用于训练目标图像模型的风格人脸样本图像,基于用于训练目标图像模型的风格人脸样本图像,训练得到该目标图像模型。前述标准风格人脸样本图像可以是专业绘制人员根据当前图像风格需求,为预设数量(取值可根据训练需求而定)的原始人脸样本图像进行风格图像绘制得到。Further, the target image model is trained based on the style face sample images obtained by using the image generation model. The image generation model may include a Generative Adversarial Networks (GAN, Generative Adversarial Networks) model, and the specific implementation principle may refer to the prior art. Exemplarily, the training process of the target image model may include: acquiring multiple standard-style face sample images, and training the obtained standard image generation model based on the multiple standard-style face sample images; using the standard image generation model to generate multiple The style face sample image used for training the target image model is trained to obtain the target image model based on the style face sample image used for training the target image model. The aforesaid standard style face sample images may be obtained by professional painters drawing style images for a preset number (values may be determined according to training requirements) of original face sample images according to current image style requirements.
返回图6,步骤S602、基于多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像,训练得到初始风格图像生成模型。Returning to FIG. 6 , in step S602, an initial style image generation model is obtained by training based on a plurality of original face sample images and a target style face sample image corresponding to each original face sample image.
初始风格图像生成模型具有风格图像生成功能。初始风格图像生成模型可以包括条件生成对抗网络CGAN模型、循环一致性生成对抗网络Cyclegan模型等任意的支持非对齐训练的网络模型,本公开实施例不作具体限定。The initial style image generation model has the function of style image generation. The initial style image generation model may include a conditional generative adversarial network CGAN model, a cycle-consistent generative adversarial network Cyclegan model, or any other network model that supports non-aligned training, which is not specifically limited in the embodiment of the present disclosure.
S603、基于初始风格图像生成模型进行至少一次裁剪操作,以得到至少一个风格图像实时生成模型。S603. Perform at least one cropping operation based on the initial style image generation model to obtain at least one style image real-time generation model.
例如,可以获取初始风格图像生成模型的第一裁剪参数,基于第一裁剪参数,对初始风格图像生成模型进行至少一次裁剪操作,得到至少一个风格图像实时生成模型。第一裁剪参数用于衡量初始风格图像生成模型中的功能模块或神经网络层的重要性。例如,可以将小于预设参数阈值(取值可以灵活设置)的第一裁剪参数对应的功能模块或神经网络层 进行裁剪,得到风格图像实时生成模型。示例性的,第一裁剪参数可以包括但不限于初始风格图像生成模型中激活层的第一重要因子,根据该第一重要因子对初始风格图像生成模型中的激活层以及与该激活层对应的卷积层进行裁剪,例如可以将小于预设参数阈值的第一重要因子对应的激活层以及与该激活层对应的卷积层进行裁剪,得到风格图像实时生成模型。For example, the first cropping parameters of the initial style image generation model may be acquired, and based on the first cropping parameters, at least one cropping operation is performed on the initial style image generation model to obtain at least one style image real-time generation model. The first cropping parameter is used to measure the importance of functional modules or neural network layers in the initial style image generation model. For example, the function module or neural network layer corresponding to the first cropping parameter that is smaller than the preset parameter threshold (the value can be set flexibly) can be cropped to obtain a real-time generation model of style images. Exemplarily, the first cropping parameter may include, but is not limited to, the first important factor of the activation layer in the initial style image generation model, according to the first important factor, the activation layer in the initial style image generation model and the corresponding activation layer. The convolutional layer is cropped, for example, the activation layer corresponding to the first important factor smaller than the preset parameter threshold and the convolutional layer corresponding to the activation layer can be cropped to obtain a style image real-time generation model.
通过对初始风格图像生成模型进行裁剪得到风格图像实时生成模型,风格图像实时生成模型的存储空间占用量、计算复杂等相对初始风格图像生成模型均有所降低,可以兼容不同性能的终端设备,降低模型运行过程中对终端设备的性能要求,可以实现实时生成风格图像的功能。风格图像实时生成模型与初始风格图像生成模型的类型相同,也可以包括条件生成对抗网络CGAN模型、循环一致性生成对抗网络Cyclegan模型等任意的支持非对齐训练的网络模型,本公开实施例不作具体限定。The style image real-time generation model is obtained by cropping the initial style image generation model. Compared with the original style image generation model, the storage space occupation and computational complexity of the style image real-time generation model are reduced. The performance requirements of the terminal equipment during the model running process can realize the function of real-time generation of style images. The style image real-time generation model is of the same type as the initial style image generation model, and can also include any network models that support non-aligned training, such as conditional generative adversarial network CGAN model, cycle-consistent generative adversarial network Cyclegan model, etc. limited.
S604、基于多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像,对至少一个风格图像实时生成模型进行训练,得到训练后的目标风格图像实时生成模型。S604. Based on the multiple original face sample images and the target style face sample image corresponding to each original face sample image, train at least one style image real-time generation model to obtain a trained target style image real-time generation model.
通过对至少一个风格图像实时生成模型进行训练,可以得到满足风格图像生成需求的风格图像实时生成模型。By training at least one style image real-time generation model, a style image real-time generation model that meets the style image generation requirements can be obtained.
在本公开实施例中,初始风格图像生成模型(大模型)和风格图像实时生成模型(小模型)的训练过程相当于大小模型的训练策略,由于风格图像实时生成模型是在初始风格图像生成模型的训练基础上实现,利用的样本数据拥有一致性,因此,可以大大降低实时模型的训练难度,同时,通过在初始风格图像生成模型的训练基础上得到风格图像实时生成模型,实现了利用大模型监督实时模型的特征,进一步加速了风格图像实时生成模型的训练。In the embodiment of the present disclosure, the training process of the initial style image generation model (large model) and the style image real-time generation model (small model) is equivalent to the training strategy of the size model, because the style image real-time generation model is the same as the initial style image generation model It is realized on the basis of the training of the original style image, and the sample data used is consistent, so the training difficulty of the real-time model can be greatly reduced. Supervising the features of real-time models further accelerates the training of real-time generation models for style images.
根据本公开实施例的技术方案,在基于多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像训练得到初始风格图像生成模型后,基于初始风格图像生成模型的第一裁剪参数,对初始风格图像生成模型进行裁剪,并继续对裁剪后的初始风格图像生成模型进行训练,得到风格图像实时生成模型,该风格图像实时生成模型的空间占用量、计算复杂度均小于初始风格图像生成模型,具有实时生成风格图像的功能,因此,在风格图像实时生成模型的应用阶段,可以利用风格图像实时生成模型实时生成符合用户需求的风格图像。本公开实施例解决了现有模型的训练方式单一、以及不能满足用户实时生成风格图像的需求的问题,实现了为用户实时生成风格图像的效果,提高了用户对图像风格转换功能的使用体验。According to the technical solutions of the embodiments of the present disclosure, after training an initial style image generation model based on a plurality of original face sample images and a target style face sample image corresponding to each original face sample image, the initial style image generation model is generated based on the initial style image. The first cropping parameter of the first cropping parameter, crop the initial style image generation model, and continue to train the cropped initial style image generation model to obtain the style image real-time generation model, the space occupancy and computational complexity of the style image real-time generation model Both are smaller than the initial style image generation model, and have the function of real-time generation of style images. Therefore, in the application stage of the style image real-time generation model, the style image real-time generation model can be used to generate real-time style images that meet user needs. The embodiment of the present disclosure solves the problems that the existing model has a single training method and cannot meet the needs of users to generate style images in real time, realizes the effect of real-time generation of style images for users, and improves the user experience of using the image style conversion function.
在得到风格图像实时生成模型的过程中,可以通过一次或者多次执行模型的训练和裁剪操作,得到风格图像实时生成模型。示例性的,基于初始风格图像生成模型进行至少一次裁剪操作,包括:基于初始风格图像生成模型进行至少二次裁剪操作,以得到第一风格图像实时生成模型和第二风格图像实时生成模型。基于多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像对第一风格图像实时生成模型和第二风格图像实时生成模型进行训练,以分别得到第一目标风格图像实时生成模型和第二目标风格图 像实时生成模型,其中,第一目标风格图像实时生成模型和第二目标风格图像实时生成模型分别对应不同的设备性能信息。In the process of obtaining the real-time generation model of the style image, the real-time generation model of the style image can be obtained by performing the training and cropping operations of the model one or more times. Exemplarily, performing at least one cropping operation based on the initial style image generation model includes: performing at least two cropping operations based on the initial style image generation model to obtain a first style image real-time generation model and a second style image real-time generation model. The first-style image real-time generation model and the second-style image real-time generation model are trained based on the multiple original face sample images and the target-style face sample image corresponding to each original face sample image, so as to obtain the first target The style image real-time generation model and the second target style image real-time generation model, wherein the first target style image real-time generation model and the second target style image real-time generation model respectively correspond to different device performance information.
进一步的,基于初始风格图像生成模型进行至少二次裁剪操作,以得到第一风格图像实时生成模型和第二风格图像实时生成模型,包括:Further, at least two cropping operations are performed based on the initial style image generation model to obtain the first style image real-time generation model and the second style image real-time generation model, including:
获取初始风格图像生成模型的第一裁剪参数;Obtain the first cropping parameter of the initial style image generation model;
基于第一裁剪参数,对初始风格图像生成模型进行裁剪,得到第一风格图像实时生成模型;Based on the first cropping parameters, crop the initial style image generation model to obtain the first style image real-time generation model;
获取训练后的第一风格图像实时生成模型的第二裁剪参数;第二裁剪参数用于衡量第一风格图像实时生成模型中的功能模块或神经网络层的重要性;Obtain the second cropping parameter of the trained first style image real-time generation model; the second cropping parameter is used to measure the importance of the functional module or the neural network layer in the first style image real-time generation model;
基于第二裁剪参数,对训练后的第一风格图像实时生成模型进行裁剪,得到第二风格图像实时生成模型。Based on the second cropping parameters, the trained first style image real-time generation model is cropped to obtain the second style image real-time generation model.
在风格图像实时生成模型的训练过程中,模型裁剪操作的循环执行次数,可以根据模型训练需求而定,本公开实施例不作具体限定。训练后的第一风格图像实时生成模型和训练后的第二风格图像实时生成模型等,均可以作为风格图像实时生成模型,具有实时生成风格图像的功能。第一风格图像实时生成模型和第二风格图像实时生成模型以及其他的风格图像实时生成模型等,可以分别对应不同的设备性能信息,从而使得后续可以根据终端设备的性能信息,向终端设备发送与其性能信息相适配的风格图像实时生成模型。即不同的风格图像实时生成模型可以兼容不同性能的终端设备,使得本公开实施例中风格图像生成方法可以广泛地应用在不同性能的终端设备上。In the training process of the real-time generation model of the style image, the number of times of cyclic execution of the model cropping operation may be determined according to the model training requirements, which is not specifically limited in the embodiment of the present disclosure. The trained first style image real-time generation model and the trained second style image real-time generation model, etc., can both be used as style image real-time generation models, and have the function of real-time generation of style images. The first style image real-time generation model, the second style image real-time generation model, and other style image real-time generation models, etc., can respectively correspond to different device performance information, so that the terminal device can be sent to the terminal device according to the performance information of the terminal device. The model is generated in real time from the style image adapted to the performance information. That is, different real-time generation models of style images can be compatible with terminal devices with different performances, so that the style image generation method in the embodiment of the present disclosure can be widely applied to terminal devices with different performances.
在上述技术方案的基础上,示例性的,获取初始风格图像生成模型的第一裁剪参数,包括:On the basis of the above technical solution, exemplarily, obtaining the first cropping parameters of the initial style image generation model, including:
获取初始风格图像生成模型中激活层的第一重要因子;Obtain the first important factor of the activation layer in the initial style image generation model;
相应的,基于第一裁剪参数,对初始风格图像生成模型进行裁剪,得到第一风格图像实时生成模型,包括:Correspondingly, based on the first cropping parameters, the initial style image generation model is cropped to obtain the first style image real-time generation model, including:
根据第一重要因子,对初始风格图像生成模型中的激活层以及与该激活层对应的卷积层进行裁剪,得到第一风格图像实时生成模型;According to the first important factor, the activation layer in the initial style image generation model and the convolution layer corresponding to the activation layer are cropped to obtain the first style image real-time generation model;
相应的,获取训练后的第一风格图像实时生成模型的第二裁剪参数,包括:Correspondingly, obtain the second cropping parameters of the real-time generation model of the first style image after training, including:
获取训练后的第一风格图像实时生成模型中激活层的第二重要因子;Obtain the second important factor of the activation layer in the real-time generation model of the first style image after training;
相应的,基于第二裁剪参数,对训练后的第一风格图像实时生成模型进行裁剪,得到第二风格图像实时生成模型,包括:Correspondingly, based on the second cropping parameters, the trained first style image real-time generation model is cropped to obtain the second style image real-time generation model, including:
基于第二重要因子,对训练后的第一风格图像实时生成模型的激活层以及与该激活层对应的卷积层进行裁剪,得到第二风格图像实时生成模型。Based on the second important factor, the activation layer of the trained first style image real-time generation model and the convolution layer corresponding to the activation layer are cropped to obtain the second style image real-time generation model.
需要说明的,将不同的原始人脸样本图像作为模型训练输入,模型训练后得到的初始风格图像生成模型中激活层的多个重要因子可以不同,此时,可以将多个重要因子的平均值作为初始风格图像生成模型中激活层的第一重要因子;同样的,将不同的原始人脸样本图像作为模型训练输入,训练后的第一风格图像实时生成模型中激活层的多个重要因子也可以为不同,此时,可以将多个重要因子的平均值作为训练后的第一风格图像实时生成模 型中激活层的第二重要因子。It should be noted that different original face sample images are used as model training input, and the multiple important factors of the activation layer in the initial style image generation model obtained after model training can be different. In this case, the average value of multiple important factors can be used. As the first important factor of the activation layer in the initial style image generation model; similarly, using different original face sample images as the model training input, the first style image after training is generated in real time. Multiple important factors of the activation layer in the model are also It can be different. In this case, the average value of multiple important factors can be used as the second important factor of the activation layer in the real-time generation model of the first style image after training.
可选的,获取初始风格图像生成模型中激活层的第一重要因子,包括:Optionally, obtain the first important factor of the activation layer in the initial style image generation model, including:
对初始风格图像生成模型中激活层的输出值,进行泰勒展开(Taylor expansion)计算,并将计算结果作为第一重要因子;Perform Taylor expansion calculation on the output value of the activation layer in the initial style image generation model, and use the calculation result as the first important factor;
相应的,获取训练后的第一风格图像实时生成模型中激活层的第二重要因子,包括:Correspondingly, obtain the second important factor of the activation layer in the real-time generation model of the first style image after training, including:
对训练后的第一风格图像实时生成模型中激活层的输出值进行泰勒展开计算,并将计算结果作为第二重要因子。The Taylor expansion calculation is performed on the output value of the activation layer in the real-time generation model of the first style image after training, and the calculation result is used as the second important factor.
示例性的,在本公开实施例中,可以首先训练得到一个的初始风格图像生成模型(大模型),然后计算每个激活层在训练结束时输出值处的一阶泰勒展开,估计每个激活层的重要程度,根据计算得到的第一重要因子,裁剪掉不重要的激活层以及相应的卷积层,然后再继续训练,得到第一风格图像实时生成模型;然后采用相同的裁剪方式对第一风格图像实时生成模型进行裁剪,并继续训练,得到第二风格图像生成模型。尽管上文以两次训练和剪裁为例进行说明,在具体应用中,可以灵活确定模型训练和裁剪的次数,不应当将上述示例理解为对本公开实施例的具体限定。Exemplarily, in the embodiment of the present disclosure, an initial style image generation model (large model) can be obtained by first training, and then the first-order Taylor expansion of each activation layer at the output value at the end of training is calculated, and each activation layer is estimated. The importance of the layer, according to the calculated first important factor, cut out the unimportant activation layer and the corresponding convolutional layer, and then continue the training to obtain the real-time generation model of the first style image; The real-time generation model of the first style image is cropped, and the training is continued to obtain the second style image generation model. Although two times of training and trimming are used as an example for description, in specific applications, the number of times of model training and trimming can be flexibly determined, and the above examples should not be construed as a specific limitation to the embodiments of the present disclosure.
图7为本公开实施例提供的另一种风格图像生成模型的训练方法的流程图,基于上述技术方案进一步优化与扩展,并可以与上述各个可选实施方式进行结合。在本公开实施例提供的风格图像生成模型的训练方法与风格图像生成方法中,针对原始人脸图像的处理过程,除了图像处理对象不同外,均属于相同的发明构思,以下实施例中未详细描述的内容,可以参考上述实施例的描述。图7与图6存在相同的操作,以下不再赘述,可以参考上述实施例的描述。FIG. 7 is a flowchart of another training method for a style image generation model provided by an embodiment of the present disclosure, which is further optimized and expanded based on the above technical solution, and can be combined with the above-mentioned optional embodiments. In the training method of the style image generation model and the style image generation method provided by the embodiment of the present disclosure, the processing process of the original face image, except for the different image processing objects, all belong to the same inventive concept, which is not detailed in the following embodiments. For the content of the description, reference may be made to the description of the foregoing embodiments. FIG. 7 and FIG. 6 have the same operations, which will not be repeated hereafter, and reference may be made to the descriptions of the foregoing embodiments.
如图7所示,本公开实施例提供的风格图像生成模型的训练方法可以包括:As shown in FIG. 7 , the training method of the style image generation model provided by the embodiment of the present disclosure may include:
S701、获取多个原始人脸样本图像。S701. Acquire multiple original face sample images.
S702、识别原始人脸样本图像的人脸区域,并确定人脸区域包围框的参数信息,以及人脸区域的旋转角度。S702: Identify the face region of the original face sample image, and determine parameter information of the bounding box of the face region and the rotation angle of the face region.
示例性的,可以利用任意可用的人脸识别技术,识别原始人脸样本图像的人脸区域,并输出原始人脸样本图像上包围人脸区域的包围框,同时,利用关键点检测技术,确定人脸区域的关键点以及人脸区域的旋转角度。其中,该人脸区域的旋转角度是指为得到满足预设人脸位置需求的图像时,人脸区域在原始人脸样本图像上应该旋转的角度;人脸区域包围框的参数信息用于表征包围框在原始人脸样本图像上的位置,人脸区域包围框的尺寸可以根据采用的人脸识别技术中设置的参数而定,也可以自定义设置。人脸区域包围框可以是任意的规则几何图形。Exemplarily, any available face recognition technology can be used to identify the face region of the original face sample image, and output the bounding box surrounding the face region on the original face sample image, and at the same time, use the key point detection technology to determine The key points of the face area and the rotation angle of the face area. The rotation angle of the face region refers to the angle at which the face region should be rotated on the original face sample image in order to obtain an image that meets the preset face position requirements; the parameter information of the face region bounding box is used to represent The position of the bounding box on the original face sample image, and the size of the bounding box of the face area can be determined according to the parameters set in the adopted face recognition technology, or can be set by yourself. The bounding box of the face region can be any regular geometric figure.
通过在识别人脸区域的同时,利用关键点检测技术获取人脸区域的旋转角度,直接用于人脸对齐的调整中,可以省去通过最小二乘法或者奇异值分解(SVD)法确定人脸区域位置调整的仿射变换矩阵的复杂操作,可以提高人脸位置调整的效率,进而可以实现实时的人脸位置调整。By using the key point detection technology to obtain the rotation angle of the face area while recognizing the face area, it can be directly used in the adjustment of the face alignment, which can save the need to determine the face by the least square method or the singular value decomposition (SVD) method. The complex operation of the affine transformation matrix for regional position adjustment can improve the efficiency of face position adjustment, and then realize real-time face position adjustment.
S703、基于人脸区域包围框的参数信息,以及人脸区域的旋转角度对人脸区域进行位置调整,获得第一人脸样本图像。S703. Adjust the position of the face region based on the parameter information of the bounding box of the face region and the rotation angle of the face region to obtain a first face sample image.
其中,人脸区域包围框的参数信息可以包括但不限于人脸区域包围框在原始人脸样本图像上各个顶点的位置坐标,或者各条边在原始人脸样本图像上距离图像边界的距离等。示例性的,可以参照现有的仿射变换原理,基于人脸区域包围框的参数信息以及人脸区域的旋转角度,构建人脸区域位置调整的仿射变换矩阵,对原始人脸样本图像上的人脸区域进行位置调整,得到满足预设人脸位置需求的图像,即第一人脸样本图像。The parameter information of the face area bounding box may include, but is not limited to, the position coordinates of each vertex of the face area bounding box on the original face sample image, or the distance of each edge from the image boundary on the original face sample image, etc. . Exemplarily, with reference to the existing affine transformation principle, based on the parameter information of the bounding box of the face region and the rotation angle of the face region, an affine transformation matrix for adjusting the position of the face region can be constructed, and the original face sample image on the The position of the face area is adjusted to obtain an image that meets the preset face position requirements, that is, the first face sample image.
人脸区域的位置调整操作可以包括但不限于旋转、平移、缩小、放大和裁剪等。在对原始人脸样本图像上的人脸区域进行位置调整过程中,可以将原始人脸样本图像作为一个整体进行位置调整,也可以采用抠图技术,将包括人脸区域包围框或者包括人脸区域的子区域进行抠图处理,从而单独对该人脸区域包围框或子区域进行位置调整,本公开实施例不作具体限定。The position adjustment operation of the face region may include, but is not limited to, rotation, translation, reduction, enlargement, and cropping. In the process of adjusting the position of the face region on the original face sample image, the position of the original face sample image can be adjusted as a whole, or the matting technique can be used to enclose the frame including the face region or include the face The sub-region of the region is subjected to matting processing, so that the position of the bounding box or sub-region of the face region is individually adjusted, which is not specifically limited in this embodiment of the present disclosure.
S704、获取与每个第一人脸样本图像对应的目标风格人脸样本图像。S704: Obtain a target-style face sample image corresponding to each first face sample image.
示例性的,可以基于多个第一人脸样本图像,利用预先训练的目标图像模型,获取与每个原始人脸样本图像对应的目标风格人脸样本图像。Exemplarily, a target-style face sample image corresponding to each original face sample image may be obtained by using a pre-trained target image model based on a plurality of first face sample images.
S705、基于多个第一人脸样本图像和与每个第一人脸样本图像对应的目标风格人脸样本图像,训练得到初始风格图像生成模型。S705. Based on the plurality of first face sample images and the target style face sample images corresponding to each first face sample image, train to obtain an initial style image generation model.
S706、基于初始风格图像生成模型进行至少一次裁剪操作,以得到至少一个风格图像实时生成模型。S706. Perform at least one cropping operation based on the initial style image generation model to obtain at least one style image real-time generation model.
S707、基多个第一人脸样本图像和与每个第一人脸样本图像对应的目标风格人脸样本图像,对至少一个风格图像实时生成模型进行训练,得到训练后的目标风格图像实时生成模型。S707. Based on the plurality of first face sample images and the target style face sample images corresponding to each of the first face sample images, train at least one style image real-time generation model to obtain the real-time generation of the trained target style images Model.
根据本公开实施例的技术方案,通过对每个原始人脸样本图像上的人脸区域进行位置调整,得到多个第一人脸样本图像,并将多个第一人脸样本图像和与每个第一人脸样本图像对应的目标风格人脸样本图像作为训练样本,提高了初始风格图像生成模型和风格图像实时生成模型的训练效果,解决了现有模型的训练方式单一、以及不能满足用户实时生成风格图像的需求的问题,同时提高了模型应用阶段风格图像的生成效果,解决了现有方案中图像风格转换后图像效果不佳的问题。并且,在识别人脸区域的同时,可以获取人脸区域的旋转角度,直接用于人脸对齐的调整中,提高了人脸位置调整的效率,可以实现实时的人脸位置调整,提高模型训练的效率。According to the technical solutions of the embodiments of the present disclosure, by adjusting the position of the face region on each original face sample image, multiple first face sample images are obtained, and the multiple first face sample images and each face sample image are combined with each other. The target style face sample image corresponding to the first face sample image is used as a training sample, which improves the training effect of the initial style image generation model and the style image real-time generation model, and solves the problem that the existing model has a single training method and cannot meet the needs of users. It solves the problem of the requirement of real-time generation of style images, and at the same time improves the generation effect of style images in the model application stage, and solves the problem of poor image effect after image style conversion in the existing scheme. In addition, while recognizing the face area, the rotation angle of the face area can be obtained, which can be directly used in the adjustment of face alignment, which improves the efficiency of face position adjustment, realizes real-time face position adjustment, and improves model training. s efficiency.
在上述技术方案的基础上,可选的,基于人脸区域包围框的参数信息,以及人脸区域的旋转角度对人脸区域进行位置调整,获得第一人脸样本图像,包括:On the basis of the above technical solution, optionally, the position of the face region is adjusted based on the parameter information of the bounding box of the face region and the rotation angle of the face region to obtain a first face sample image, including:
获取预先设置的人脸位置矫正参数值和预设图像尺寸;Obtain the preset face position correction parameter value and preset image size;
基于人脸区域包围框的参数信息、人脸区域的旋转角度、人脸位置矫正参数值和预设图像尺寸,对人脸区域进行位置调整,获得第一人脸样本图像。示例性,可以基于获取的各个参数,构建仿射变换矩阵,然后基于仿射变换矩阵对人脸区域进行位置调整。Based on the parameter information of the bounding box of the face region, the rotation angle of the face region, the face position correction parameter value and the preset image size, the position of the face region is adjusted to obtain a first face sample image. Exemplarily, an affine transformation matrix may be constructed based on the acquired parameters, and then the position of the face region may be adjusted based on the affine transformation matrix.
其中,人脸位置矫正参数值用于矫正人脸区域在位置调整后的图像上的位置,可以包括人脸上下位置矫正或人脸左右位置矫正,提高人脸区域在原始人脸样本图像上的实际位置的确定精度,进而确保人脸区域位置调整的准确性。例如,基于人脸区域包围框的参数 信息确定的人脸区域在原始人脸样本图像上沿竖直方向的位置比实际位置偏上,则可以利用预先设置人脸位置矫正参数值,准确确定人脸区域的实际位置。预设图像尺寸是指预先确定模型训练过程中的输入图像尺寸,即如果原始人脸样本图像不满足预设图像尺寸,还需要对原始人脸样本图像进行图像裁剪,确保模型训练过程中最终利用的样本图像的尺寸统一。Among them, the face position correction parameter value is used to correct the position of the face region on the position-adjusted image, which may include the correction of the lower and lower positions of the face or the correction of the left and right positions of the face, so as to improve the accuracy of the face region on the original face sample image. The accuracy of the actual position determination, thereby ensuring the accuracy of the face region position adjustment. For example, if the vertical position of the face region determined based on the parameter information of the face region bounding box is higher than the actual position on the original face sample image, the preset face position correction parameter value can be used to accurately determine the face region. The actual location of the face area. The preset image size refers to pre-determining the input image size in the model training process, that is, if the original face sample image does not meet the preset image size, the original face sample image needs to be cropped to ensure the final use of the model training process. The sample images are of uniform size.
示例性的,假设利用关键点检测技术确定的人脸区域的旋转角度可以表示为Roll,人脸位置矫正参数值可以表示为ymeanScale,ymeanScale的取值范围可以设置为[0,1],预设图像尺寸可以表示为targetSize,人脸区域包围框的参数信息包括该包围框的各条边距离原始人脸样本图像的边界的距离,以图3为例,假设以原始人脸样本图像的左下角作为图像坐标系的原点,经过左下角的图像边界分别为x轴和y轴,则人脸区域包围框水平方向上的两条边距离x轴的距离可以表示为第一距离b和第二距离t,人脸区域包围框竖直方向上的两条边距离y轴的距离可以表示为第三距离l和第四距离r。基于上述假设,Exemplarily, it is assumed that the rotation angle of the face region determined by the key point detection technology can be expressed as Roll, the value of the face position correction parameter can be expressed as ymeanScale, the value range of ymeanScale can be set to [0, 1], the preset The image size can be expressed as targetSize, and the parameter information of the bounding box of the face area includes the distance between each edge of the bounding box and the boundary of the original face sample image. Taking Figure 3 as an example, it is assumed that the lower left corner of the original face sample image is used. As the origin of the image coordinate system, the image boundary passing through the lower left corner is the x-axis and the y-axis respectively, then the distance between the two sides of the face area bounding box in the horizontal direction from the x-axis can be expressed as the first distance b and the second distance t, the distance between the two sides of the face area bounding box in the vertical direction from the y-axis can be expressed as a third distance l and a fourth distance r. Based on the above assumptions,
人脸区域中心的横坐标值:The abscissa value of the center of the face area:
xMean=(l+r)/2;xMean=(l+r)/2;
人脸区域中心的纵坐标值:The ordinate value of the center of the face area:
yMean=ymeanScale·t+(1-ymeanScale)·b;yMean=ymeanScale·t+(1-ymeanScale)·b;
进而依据仿射变换原理,用于调整人脸区域位置的仿射变换矩阵可以表示为一个2x3的矩阵M,如下所示:Furthermore, according to the principle of affine transformation, the affine transformation matrix used to adjust the position of the face region can be expressed as a 2x3 matrix M, as shown below:
Figure PCTCN2021113225-appb-000003
Figure PCTCN2021113225-appb-000003
图8为本公开实施例提供的另一种风格图像生成模型的训练方法的流程图,基于上述技术方案进一步优化与扩展,并可以与上述各个可选实施方式进行结合。图8分别与图6或图7中存在相同的操作,以下不再赘述,可以参考上述实施例的描述。FIG. 8 is a flowchart of another method for training a style image generation model according to an embodiment of the present disclosure, which is further optimized and expanded based on the above technical solution, and can be combined with the above-mentioned optional embodiments. FIG. 8 has the same operations as those in FIG. 6 or FIG. 7 respectively, which will not be repeated below, but the description of the above-mentioned embodiment may be referred to.
如图8所示,该风格图像生成模型的训练方法可以包括:As shown in Figure 8, the training method of the style image generation model may include:
S801、获取原始人脸样本图像。S801. Obtain an original face sample image.
S802、识别原始人脸样本图像的人脸区域,并确定人脸区域包围框的参数信息,以及人脸区域的旋转角度。S802. Identify the face region of the original face sample image, and determine parameter information of the bounding box of the face region and the rotation angle of the face region.
其中,人脸区域包围框的四个边与原始人脸样本图像的四个边平行,人脸区域包围框的参数信息包括四个边在原始人脸样本图像中的位置参数。The four sides of the face area bounding box are parallel to the four sides of the original face sample image, and the parameter information of the face area bounding box includes position parameters of the four sides in the original face sample image.
S803、获取预先设置的人脸位置矫正参数值和预设图像尺寸。S803. Acquire a preset face position correction parameter value and a preset image size.
其中,人脸位置矫正参数值用于矫正人脸区域在位置调整后的图像上的位置。The value of the face position correction parameter is used to correct the position of the face region on the position-adjusted image.
S804、基于人脸区域包围框的四个边对应的处于水平方向上的位置参数,计算人脸区域中心的横坐标值。S804: Calculate the abscissa value of the center of the face region based on the position parameters in the horizontal direction corresponding to the four sides of the bounding box of the face region.
S805、基于人脸区域包围框的四个边对应的处于竖直方向上的位置参数和人脸位置矫正参数值,计算人脸区域中心的纵坐标值。S805: Calculate the ordinate value of the center of the face region based on the position parameters in the vertical direction corresponding to the four sides of the bounding box of the face region and the value of the face position correction parameter.
S806、获取预先设置的人脸裁剪占比。S806. Obtain a preset proportion of face cropping.
人脸裁剪占比edgeScale用于表示在原始人脸样本图像上对人脸区域包围框的裁剪倍数,例如,人脸裁剪占比取值为2,即表示在原始人脸样本图像上,按照人脸区域包围框尺寸的2倍,对包括人脸区域的图像区域进行裁剪。The face cropping ratio edgeScale is used to indicate the cropping multiple of the bounding box of the face region on the original face sample image. For example, the face cropping ratio is 2, which means that on the original face sample image, according to the human 2 times the size of the face area bounding box, crop the image area including the face area.
S807、基于人脸裁剪占比和人脸区域包围框的边长值,计算人脸区域的边长值。S807. Calculate the side length value of the face area based on the face cropping ratio and the side length value of the bounding box of the face area.
仍以图3为例,人脸区域包围框的边长值可以表示为第三距离l和第四距离r之间的差值(r-l),或者第一距离b和第二距离t之间的差值(t-b)。人脸区域的边长值edgeLength可以表示为:Still taking FIG. 3 as an example, the value of the side length of the bounding box of the face area can be expressed as the difference (r-l) between the third distance l and the fourth distance r, or the difference between the first distance b and the second distance t. Difference (t-b). The edge length value edgeLength of the face area can be expressed as:
edgeLength=edgeScale·(r-l)。edgeLength=edgeScale·(r-l).
S808、基于人脸区域的边长值和预设图像尺寸,计算缩放尺寸值。S808. Calculate the scaling size value based on the side length value of the face region and the preset image size.
缩放尺寸值s可以表示为预设图像尺寸和人脸区域的边长值之间的比值,具体表示为s=targetSize/edgeLength。The scaling size value s may be expressed as the ratio between the preset image size and the edge length value of the face region, specifically expressed as s=targetSize/edgeLength.
S809、基于人脸区域中心的横坐标值、人脸区域中心的纵坐标值、人脸区域的旋转角度、预设图像尺寸和缩放尺寸值,构建仿射变换矩阵。S809 , construct an affine transformation matrix based on the abscissa value of the center of the face area, the ordinate value of the center of the face area, the rotation angle of the face area, the preset image size and the scaling size value.
基于上述参数表示方式,仿射变换矩阵M可以表示如下:Based on the above parameter representation, the affine transformation matrix M can be expressed as follows:
Figure PCTCN2021113225-appb-000004
Figure PCTCN2021113225-appb-000004
其中,Roll表示利用关键点检测技术确定的人脸区域的旋转角度,targetSize表示预设图像尺寸,(xMean,yMean)表示人脸区域中心的坐标。关于仿射变换矩阵M的详细解释可以参考前述实施例中的解释。Among them, Roll represents the rotation angle of the face area determined by the key point detection technology, targetSize represents the preset image size, and (xMean, yMean) represents the coordinates of the center of the face area. For a detailed explanation of the affine transformation matrix M, reference may be made to the explanations in the foregoing embodiments.
S810、基于仿射变换矩阵,对人脸区域进行位置调整,获得第一人脸样本图像。S810. Based on the affine transformation matrix, adjust the position of the face region to obtain a first face sample image.
S811、获取与每个第一人脸样本图像对应的目标风格人脸样本图像。S811. Acquire a target-style face sample image corresponding to each first face sample image.
示例性的,可以基于多个第一人脸样本图像,利用预先训练的目标图像模型,获取与每个原始人脸样本图像对应的目标风格人脸样本图像。Exemplarily, a target-style face sample image corresponding to each original face sample image may be obtained by using a pre-trained target image model based on a plurality of first face sample images.
S812、基于多个第一人脸样本图像和与每个第一人脸样本图像对应的目标风格人脸样本图像,训练得到初始风格图像生成模型。S812 , based on a plurality of first face sample images and a target style face sample image corresponding to each first face sample image, train to obtain an initial style image generation model.
S813、基于初始风格图像生成模型进行至少一次裁剪操作,以得到至少一个风格图像实时生成模型。S813. Perform at least one cropping operation based on the initial style image generation model to obtain at least one style image real-time generation model.
S814、基多个第一人脸样本图像和与每个第一人脸样本图像对应的目标风格人脸样本图像,对至少一个风格图像实时生成模型进行训练,得到训练后的目标风格图像实时生成模型。S814. Based on the plurality of first face sample images and the target style face sample images corresponding to each of the first face sample images, train at least one style image real-time generation model to obtain the real-time generation of the trained target style images Model.
根据本公开实施例的技术方案,通过对原始人脸样本图像的裁剪、缩放等需求,构建人脸区域位置调整所需的仿射变换矩阵,根据仿射变换矩阵对人脸区域在原始人脸样本图像上的位置进行调整,确保了人脸区域位置调整的准确性,并将多个第一人脸样本图像和与每个第一人脸样本图像对应的目标风格人脸样本图像作为训练样本,提高了初始风格图像生成模型和风格图像实时生成模型的训练效果,解决了现有模型的训练方式单一、以及 不能满足用户实时生成风格图像的需求的问题,同时提高了模型应用阶段风格图像的生成效果,解决了现有方案中图像风格转换后图像效果不佳的问题。According to the technical solutions of the embodiments of the present disclosure, an affine transformation matrix required for adjusting the position of the face region is constructed through the requirements of cropping, scaling, etc. of the original face sample image. The position on the sample image is adjusted to ensure the accuracy of the adjustment of the position of the face region, and the multiple first face sample images and the target style face sample image corresponding to each first face sample image are used as training samples. , which improves the training effect of the initial style image generation model and the style image real-time generation model, solves the problem that the existing model has a single training method and cannot meet the needs of users to generate style images in real time, and at the same time improves the model application stage. The generation effect solves the problem of poor image effect after image style conversion in the existing scheme.
图9为本公开实施例提供的另一种风格图像生成模型的训练方法的流程图,基于上述技术方案进一步优化与扩展,并可以与上述各个可选实施方式进行结合。图9与图6中存在相同的操作,以下不再赘述,可以参考上述实施例的描述。FIG. 9 is a flowchart of another method for training a style image generation model provided by an embodiment of the present disclosure, which is further optimized and expanded based on the above technical solution, and can be combined with the above-mentioned optional embodiments. The operations in FIG. 9 and FIG. 6 are the same, which will not be described in detail below. Reference may be made to the descriptions of the above embodiments.
如图9所示,该风格图像生成模型的训练方法可以包括:As shown in Figure 9, the training method of the style image generation model may include:
S901、获取多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像。S901. Acquire a plurality of original face sample images and a target-style face sample image corresponding to each original face sample image.
S902、对目标风格人脸样本图像上的人脸区域进行脸部调整,得到第一风格人脸样本图像。S902. Perform face adjustment on the face region on the target style face sample image to obtain a first style face sample image.
脸部调整是指根据对人物脸型的展示需求,对目标风格人脸样本图像上的人物脸部进行调整。脸部调整包括以下至少之一:脸型调整和嘴部调整。脸型调整是指根据对人物脸型的展示需求,对目标风格人脸样本图像上的人物脸型进行调整,例如瘦脸调整;嘴部调整是指根据对人物嘴部的展示需求,对目标风格人脸样本图像上的人物嘴部进行调整,例如调整嘴部形状、控制嘴部线条的粗细程度一致等。也即在本公开实施例中,支持对目标风格人脸样本图像进行脸部微调,使得目标风格人脸样本图像的呈现效果更加美观,进而确保训练得到的初始风格图像生成模型和风格图像实时生成模型更加准确,可以针对任意的输入图像,输出具有较高展示效果的风格图像。Face adjustment refers to adjusting the face of the person on the target-style face sample image according to the display requirements for the face shape of the person. The face adjustment includes at least one of the following: face shape adjustment and mouth adjustment. Face shape adjustment refers to the adjustment of the face shape on the target style face sample image according to the display requirements of the character face, such as face-lift adjustment; Adjust the mouth of the character on the image, such as adjusting the shape of the mouth, controlling the thickness of the mouth lines to be consistent, etc. That is to say, in this embodiment of the present disclosure, face fine-tuning is supported on the target style face sample image, so that the presentation effect of the target style face sample image is more beautiful, thereby ensuring that the initial style image generation model obtained by training and the style image are generated in real time. The model is more accurate, and can output a style image with high display effect for any input image.
S903、基于多个原始人脸样本图像和与每个原始人脸样本图像对应的第一风格人脸样本图像,训练得到初始风格图像生成模型。S903 , based on a plurality of original face sample images and a first style face sample image corresponding to each original face sample image, train to obtain an initial style image generation model.
S904、基于初始风格图像生成模型进行至少一次裁剪操作,以得到至少一个风格图像实时生成模型。S904. Perform at least one cropping operation based on the initial style image generation model to obtain at least one style image real-time generation model.
S905、基于多个原始人脸样本图像和与每个原始人脸样本图像对应的第一风格人脸样本图像,对至少一个风格图像实时生成模型进行训练,得到训练后的目标风格图像实时生成模型。S905, based on the multiple original face sample images and the first style face sample image corresponding to each original face sample image, train at least one style image real-time generation model to obtain a trained target style image real-time generation model .
根据本公开实施例的技术方案,通过对目标风格人脸样本图像进行脸部微调,优化脸部特征的展示效果,实现了构建优质的样本数据,提高了初始风格图像生成模型和风格图像实时生成模型的训练效果,进而确保了模型应用阶段风格图像的生成效果。According to the technical solutions of the embodiments of the present disclosure, by performing face fine-tuning on the target style face sample image, the display effect of facial features is optimized, the construction of high-quality sample data is realized, and the initial style image generation model and the real-time generation of style images are improved. The training effect of the model, thereby ensuring the generation effect of the style image in the model application stage.
在上述技术方案的基础上,可选的,对目标风格人脸样本图像上的人脸区域进行脸型调整,包括:On the basis of the above technical solution, optionally, face shape adjustment is performed on the face region on the target style face sample image, including:
确定目标风格人脸样本图像上人脸区域的初始脸部轮廓关键点,以及与初始脸部轮廓关键点对应的目标脸部轮廓关键点;Determine the initial face contour key points of the face region on the target-style face sample image, and the target face contour key points corresponding to the initial face contour key points;
基于初始脸部轮廓关键点和目标脸部轮廓关键点,对目标风格人脸样本图像上人脸区域的脸部轮廓进行调整,以得到第一风格人脸样本图像。Based on the initial face contour key points and the target face contour key points, the face contour of the face region on the target style face sample image is adjusted to obtain the first style face sample image.
其中,初始脸部轮廓关键点可以利用关键点检测技术对目标风格人脸样本图像上人脸区域进行关键点检测得到。目标脸部轮廓关键点根据脸型调整需求确定。根据初始脸部轮廓关键点和目标脸部轮廓关键点之间的平移变换,将初始脸部轮廓关键点移动至目标脸部 轮廓关键点,从而实现脸部调整。Among them, the key points of the initial face contour can be obtained by using the key point detection technology to perform key point detection on the face region on the target style face sample image. The key points of the target face contour are determined according to the face shape adjustment requirements. According to the translation transformation between the initial face contour key point and the target face contour key point, the initial face contour key point is moved to the target face contour key point, so as to realize face adjustment.
进一步的,基于初始脸部轮廓关键点和目标脸部轮廓关键点,对目标风格人脸样本图像上人脸区域的脸部轮廓进行调整,包括:Further, based on the initial face contour key points and the target face contour key points, the face contour of the face region on the target-style face sample image is adjusted, including:
将初始脸部轮廓关键点移动至目标脸部轮廓关键点,并利用薄板样条插值函数对目标风格人脸样本图像上的人脸区域进行形变处理;Move the initial face contour key point to the target face contour key point, and use the thin plate spline interpolation function to deform the face area on the target style face sample image;
利用目标风格人脸样本图像的人脸纹理渲染形变处理后的人脸区域,以得到第一风格人脸样本图像。The deformed face region is rendered using the face texture of the target style face sample image to obtain the first style face sample image.
其中,薄板样条插值函数(thin plate spline)是一种二维形变处理算法,具体原理可以参照现有技术实现。利用薄板样条插值函数对人脸区域进行形变处理可以确保脸部调整后的脸部轮廓的平滑性。利用目标风格人脸样本图像的人脸纹理渲染形变处理后的人脸区域,可以确保脸部调整后脸部纹理的一致性。Among them, the thin plate spline interpolation function (thin plate spline) is a two-dimensional deformation processing algorithm, and the specific principle can be realized with reference to the prior art. Using the thin-plate spline interpolation function to deform the face region can ensure the smoothness of the face contour after face adjustment. Using the face texture of the target-style face sample image to render the deformed face region can ensure the consistency of the face texture after face adjustment.
可选的,利用薄板样条插值函数对人脸区域进行形变处理具体可以包括:Optionally, using the thin-plate spline interpolation function to deform the face region may specifically include:
对目标风格人脸样本图像上的人脸区域进行三角化处理,得到三角化网格,即可以至少将人脸区域进行网格化的区域划分;Triangulate the face area on the target-style face sample image to obtain a triangulated mesh, that is, at least the face area can be divided into meshed areas;
利用薄板样条插值函数对三角化网络的顶点进行平移。The vertices of the triangulation network are translated using a thin-plate spline interpolation function.
其中,根据处理需求,也可以对整个风格图像区域进行三角化处理;三角化处理作为一种示例,具有计算处理较为便捷的优势,当然实际应用中还可以适应性采用其他样式的图像网格化处理。Among them, according to the processing requirements, the entire style image area can also be triangulated; as an example, triangulation has the advantage of convenient calculation and processing. Of course, other styles of image meshing can also be adapted in practical applications. deal with.
示例性的,可以对目标风格人脸样本图像上的人脸区域或者整个目标风格人脸样本图像进行三角化处理,在确定出目标风格人脸样本图像上人脸区域的初始脸部轮廓关键点L1和目标脸部轮廓关键点L2之后,利用薄板样条插值函数将初始脸部轮廓关键点L1到目标脸部轮廓关键点L2的平移量插值到各个三角网格顶点上,对各个三角网格顶点进行平移处理,最后利用目标风格人脸样本图像上的人脸纹理作为当前纹理,渲染新三角网格后得到瘦脸后的目标风格人脸样本图像。其中,通过对人脸区域的各个三角网格顶点进行平移,相比于只平移脸部轮廓关键点的情况,可以在一定程度上降低脸部变形的风险,保持脸部整体的呈现效果。Exemplarily, the face area on the target-style face sample image or the entire target-style face sample image may be triangulated, and the initial face contour key points of the face area on the target-style face sample image are determined. After L1 and the target face contour key point L2, the thin-plate spline interpolation function is used to interpolate the translation amount from the initial face contour key point L1 to the target face contour key point L2 to each triangular mesh vertex. The vertices are translated, and finally the face texture on the target-style face sample image is used as the current texture, and the new triangular mesh is rendered to obtain the target-style face sample image after face reduction. Among them, by translating each triangular mesh vertex of the face area, compared with the case of translating only the key points of the face contour, the risk of face deformation can be reduced to a certain extent, and the overall presentation effect of the face can be maintained.
在上述技术方案的基础上,可选的,对目标风格人脸样本图像上的人脸区域进行嘴部调整,包括:On the basis of the above technical solution, optionally, performing mouth adjustment on the face region on the target-style face sample image, including:
确定目标风格人脸样本图像上人脸区域的嘴部关键点;此外,还可以基于嘴部关键点确定嘴部状态;Determine the key points of the mouth in the face area on the target-style face sample image; in addition, the state of the mouth can also be determined based on the key points of the mouth;
将基于嘴部关键点确定的嘴部从目标风格人脸样本图像的人脸区域中移除,得到残缺风格人脸样本图像;例如,可以在确定嘴部状态为张开状态时,将基于嘴部关键点确定的嘴部从目标风格人脸样本图像的人脸区域中移除,得到残缺风格人脸样本图像;The mouth determined based on the key points of the mouth is removed from the face area of the target-style face sample image, and the incomplete-style face sample image is obtained; for example, when the mouth state is determined to be open, the The mouth determined by the key points is removed from the face area of the target style face sample image, and the incomplete style face sample image is obtained;
将预先生成的嘴部素材与残缺风格人脸样本图像进行融合,以得到第一风格人脸样本图像。The pre-generated mouth material is fused with the incomplete style face sample image to obtain the first style face sample image.
其中,嘴部关键点同样可以利用关键点检测技术,对目标风格人脸样本图像上人脸区域进行关键点检测得到。嘴部状态可以根据属于上唇和下唇的关键点之间的距离来确定, 例如,在上唇和下唇的关键点中,上下对应的关键点之间的距离超过距离阈值的关键点数量超过数量阈值,则认为嘴部状态为张开状态,否则为闭合状态。距离阈值和数量阈值均可以适应性设置。如果确定嘴部为张开状态,为确保嘴部展示效果,采用预先设计的嘴部素材替换目标风格人脸样本图像上的嘴部。Among them, the key points of the mouth can also be obtained by using the key point detection technology to perform key point detection on the face area on the target-style face sample image. The mouth state can be determined according to the distance between the keypoints belonging to the upper and lower lips, for example, among the keypoints of the upper and lower lips, the number of keypoints whose distance between the upper and lower corresponding keypoints exceeds the distance threshold exceeds the number of Threshold, the mouth state is considered to be open, otherwise it is closed. Both the distance threshold and the number threshold can be set adaptively. If it is determined that the mouth is open, in order to ensure the effect of the mouth, the pre-designed mouth material is used to replace the mouth on the target style face sample image.
可选的,将基于嘴部关键点确定的嘴部从目标风格人脸样本图像的人脸区域中移除,得到残缺风格人脸样本图像,包括:Optionally, the mouth determined based on the key points of the mouth is removed from the face region of the target-style face sample image to obtain the incomplete-style face sample image, including:
基于嘴部关键点,在目标风格人脸样本图像的人脸区域中确定包围嘴部的子区域;其中,子区域的大小可以适应性确定,本公开实施例不作具体限定;Based on the key points of the mouth, a sub-region surrounding the mouth is determined in the face region of the target-style face sample image; wherein, the size of the sub-region can be determined adaptively, which is not specifically limited in the embodiment of the present disclosure;
利用固定边界求解算法,确定子区域中的嘴部边界线;其中,固定边界求解算法是指图像处理领域中,用于确定目标图形(例如嘴部)的边界的算法,例如基于拉普拉斯(Laplace)算子的边缘检测算法等,具体可以参照现有技术实现,计算过程中的边界条件根据子区域的边界上包括的各个关键点而定,即根据子区域边界上人脸皮肤的关键点而定;Using a fixed boundary solution algorithm to determine the mouth boundary line in the sub-region; wherein, the fixed boundary solution algorithm refers to an algorithm used to determine the boundary of a target figure (such as a mouth) in the field of image processing, such as based on Laplace The edge detection algorithm of the (Laplace) operator, etc., can be implemented with reference to the prior art. The boundary conditions in the calculation process are determined according to each key point included on the boundary of the sub-region, that is, according to the key points of the face skin on the boundary of the sub-region. point;
基于嘴部边界线,将嘴部从目标风格人脸样本图像的人脸区域中移除,得到残缺风格图像。Based on the mouth boundary line, the mouth is removed from the face region of the target style face sample image to obtain the incomplete style image.
通过首先确定嘴部关键点,然后根据嘴部关键点确定包围嘴部的子区域,最后通过确定嘴部边界线,将嘴部从目标风格人脸样本图像上移除,不仅提高了嘴部区域确定的灵敏度,而且保证了嘴部区域移除的准确性、合理性。By first determining the mouth key points, then determining the sub-region surrounding the mouth according to the mouth key points, and finally removing the mouth from the target-style face sample image by determining the mouth boundary line, not only the mouth region is improved Determining the sensitivity, and ensuring the accuracy and rationality of mouth area removal.
进一步的,将预先生成的嘴部素材与残缺风格人脸样本图像进行融合,包括:Further, the pre-generated mouth material is fused with the incomplete-style face sample image, including:
将嘴部素材上标注的关键点与目标风格人脸样本图像的人脸区域中嘴部关键点进行对齐处理,并基于薄板样条插值函数对嘴部素材进行形变处理;Align the key points marked on the mouth material with the key points of the mouth in the face area of the target-style face sample image, and deform the mouth material based on the thin-plate spline interpolation function;
利用目标风格人脸样本图像的嘴部纹理渲染形变处理后的嘴部素材。The deformed mouth material is rendered using the mouth texture of the target-style face sample image.
其中,嘴部素材上标注的关键点和目标风格人脸样本图像的人脸区域中嘴部关键点具有对应关系,例如关键点坐标在相同的图像坐标系下确定。将嘴部素材上标注的关键点与目标风格人脸样本图像的人脸区域中嘴部关键点进行对齐处理,也即实现嘴部素材和目标风格人脸样本图像的人脸区域中嘴部之间的关键点映射,能够将嘴部素材贴回残缺风格样本图像上的嘴部区域。利用薄板样条插值函数对嘴部素材进行形变处理,可以确保嘴部素材边界的平滑性,保证嘴部的展示效果。Among them, the key points marked on the mouth material and the key points of the mouth in the face region of the target-style face sample image have a corresponding relationship, for example, the coordinates of the key points are determined in the same image coordinate system. Align the key points marked on the mouth material with the key points of the mouth in the face area of the target-style face sample image, that is, to realize the difference between the mouth material and the mouth in the face area of the target-style face sample image. The key point mapping between the two can paste the mouth material back to the mouth area on the incomplete style sample image. Using the thin-plate spline interpolation function to deform the mouth material can ensure the smoothness of the border of the mouth material and ensure the display effect of the mouth.
可选的,将预先生成的嘴部素材与残缺风格图像进行融合,还包括:Optionally, fuse the pre-generated mouth footage with the mutilated style image, including:
确定嘴部素材上嘴部轮廓的内边界线和外边界线;Determine the inner and outer boundary lines of the mouth contour on the mouth material;
基于薄板样条插值函数对嘴部素材进行形变处理,包括:The mouth material is deformed based on the thin-plate spline interpolation function, including:
利用薄板样条插值函数对嘴部素材内边界线以内的区域进行形变处理;Use thin-plate spline interpolation function to deform the area within the inner boundary line of the mouth material;
利用求解最优化算法对内边界线和外边界线之间的区域进行形变处理。The area between the inner boundary line and the outer boundary line is deformed using the solution optimization algorithm.
通过综合利用薄板样条插值函数和求解最优化算法对嘴部素材进行处理,可以确保嘴部边缘勾线的粗细保持不变。图10为本公开实施例提供的一种嘴部素材的示意图,并具体展示了嘴部边缘勾线的内边界线和外边界线,并且内边界线和外边界线之间可以根据需求进行适当的颜色填充。By comprehensively using the thin plate spline interpolation function and solving the optimization algorithm to process the mouth material, it can ensure that the thickness of the outline of the mouth edge remains unchanged. 10 is a schematic diagram of a mouth material provided by an embodiment of the present disclosure, and specifically shows the inner boundary line and outer boundary line of the outline of the mouth edge, and the inner boundary line and the outer boundary line can be appropriately colored according to requirements filling.
示例性的,为了确保嘴部素材与残缺风格图像进行融合过程中,嘴部边缘勾线的粗细 保持不变,针对嘴部素材,需要生成一个贴合嘴部勾线的双层网格:内层网格,即指对嘴部素材内边界线以内的区域进行网格化处理后得到的网格;外层网格,即指对内边界线和外边界线之间的区域进行网格化处理后得到的网格。内层网格和内层网格均可以是三角化网格。针对外层网格的形变控制,可以基于不含旋转的尽可能刚性变换(as-rigid-as-possible without rotation)实现,针对内层网格的形变控制,仍可以基于薄板样条插值函数实现。具体的,可以在将嘴部素材与残缺风格图像进行融合过程中,首先利用薄板样条插值函数对内层网格进行形变处理,然后跟随薄板样条插值,通过求解最优化问题
Figure PCTCN2021113225-appb-000005
得到外层网格的顶点,外层网格的顶点确定之后,外层网格的区域也可以确定,从而实现控制内边界线和外边界线之间的区域在嘴部素材与残缺风格图像的融合过程中保持不变,即嘴部边缘勾线的粗细保持不变。其中,u表示未知顶点,
Figure PCTCN2021113225-appb-000006
表示对原始位置的梯度,I为2x2单位矩阵。并且,本公开实施例除了可以优化嘴部处于张开状态时嘴部边缘勾线的粗细外,如果嘴部处于闭合状态时,也存在嘴部边缘勾线的粗细控制需求,同样可以采用上述方式实现。上述嘴部调整操作均是基于目标风格人脸样本图像实现,实现对目标风格人脸样本图像上嘴部的显示效果的优化,进而优化训练得到的初始风格图像生成模型和风格图像实时生成模型的效果。
Exemplarily, in order to ensure that the thickness of the outline of the mouth edge remains unchanged during the fusion of the mouth material and the incomplete style image, for the mouth material, a double-layer mesh that fits the mouth outline needs to be generated: The layer mesh refers to the mesh obtained by meshing the area within the inner boundary line of the mouth material; the outer mesh refers to meshing the area between the inner boundary line and the outer boundary line. The resulting grid. Both the inner mesh and the inner mesh may be triangulated meshes. The deformation control of the outer mesh can be implemented based on as-rigid-as-possible without rotation, and the deformation control of the inner mesh can still be implemented based on the thin-plate spline interpolation function. . Specifically, in the process of fusing the mouth material and the incomplete style image, the inner layer mesh can be deformed by using the thin-plate spline interpolation function, and then the thin-plate spline interpolation can be used to solve the optimization problem.
Figure PCTCN2021113225-appb-000005
The vertices of the outer mesh are obtained. After the vertices of the outer mesh are determined, the area of the outer mesh can also be determined, so as to realize the fusion of the mouth material and the incomplete style image by controlling the area between the inner boundary line and the outer boundary line. The process remains unchanged, that is, the thickness of the hook line on the edge of the mouth remains unchanged. where u represents an unknown vertex,
Figure PCTCN2021113225-appb-000006
represents the gradient to the original position, and I is a 2x2 identity matrix. Moreover, in addition to optimizing the thickness of the hook line on the edge of the mouth when the mouth is in the open state, the embodiment of the present disclosure also needs to control the thickness of the hook line on the edge of the mouth when the mouth is in the closed state, and the above method can also be used. accomplish. The above mouth adjustment operations are all implemented based on the target style face sample image, to optimize the display effect of the mouth on the target style face sample image, and then optimize the training of the initial style image generation model and the style image real-time generation model. Effect.
此外,需要说明的是,在本公开实施例中针对模型训练阶段和风格图像生成阶段,技术方案描述过程中存在相同的用词,应结合具体的实施阶段,对用词的含义进行理解。In addition, it should be noted that in the embodiment of the present disclosure, for the model training stage and the style image generation stage, there are the same terms in the description of the technical solution, and the meaning of the terms should be understood in conjunction with the specific implementation stage.
图11为本公开实施例提供的一种风格图像生成装置的结构示意图,本公开实施例可以适用于基于原始人脸图像,生成任意风格的风格图像的情况。该装置可以采用软件和/或硬件实现,并可集成在任意具有计算能力的电子设备上,例如终端等,该终端可以包括但不限于智能移动终端、平板电脑、个人电脑等。11 is a schematic structural diagram of an apparatus for generating a style image provided by an embodiment of the present disclosure. The embodiment of the present disclosure may be applicable to a situation in which a style image of any style is generated based on an original face image. The apparatus can be implemented by software and/or hardware, and can be integrated on any electronic device with computing capabilities, such as a terminal, which may include, but is not limited to, a smart mobile terminal, a tablet computer, a personal computer, and the like.
如图11所示,本公开实施例提供的风格图像生成装置1100可以包括原始人脸图像获取模块1101和目标风格人脸图像生成模块1102,其中:As shown in FIG. 11 , the style image generation apparatus 1100 provided by the embodiment of the present disclosure may include an original face image acquisition module 1101 and a target style face image generation module 1102, wherein:
原始人脸图像获取模块1101,用于获取原始人脸图像;An original face image acquisition module 1101, used to acquire an original face image;
目标风格人脸图像生成模块1102,用于利用预先训练的目标风格图像实时生成模型,得到与原始人脸图像对应的目标风格人脸图像;The target style face image generation module 1102 is used to generate a model in real time by using the pre-trained target style image to obtain the target style face image corresponding to the original face image;
其中,目标风格图像实时生成模型是在训练得到初始风格图像生成模型后,对基于初始风格图像生成模型按照至少一组裁剪参数进行至少一次裁剪操作得到的至少一个风格图像实时生成模型训练得到,并且初始风格图像生成模型和目标风格图像实时生成模型均基于多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像训练得到,其中,所述风格图像实时生成模型随着所述裁剪参数的变化而变化。The target style image real-time generation model is obtained by training at least one style image real-time generation model obtained by performing at least one cropping operation based on the initial style image generation model according to at least one set of cropping parameters after the initial style image generation model is obtained by training, and The initial style image generation model and the target style image real-time generation model are both obtained by training based on multiple original face sample images and the target style face sample image corresponding to each original face sample image, wherein the style image real-time generation model varies with the cropping parameters.
可选的,多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像分别为预先训练的目标图像模型的输入和输出,所述目标图像模型用于生成原始人脸样本图像对应的目标风格人脸样本图像,以为所述初始风格图像生成模型和所述目标风格图像生成模型提供训练样本。Optionally, a plurality of original face sample images and a target-style face sample image corresponding to each original face sample image are respectively the input and output of a pre-trained target image model, and the target image model is used to generate the original face sample image. The target style face sample image corresponding to the face sample image provides training samples for the initial style image generation model and the target style image generation model.
可选的,基于初始风格图像生成模型按照至少两组裁剪参数进行至少两次裁剪操作,以相应地得到至少两个风格图像实时生成模型,对至少两个风格图像实时生成模型进行训 练得到至少两个目标风格图像实时生成模型,并且,至少两个目标风格图像实时生成模型分别对应不同的设备性能信息;Optionally, at least two cropping operations are performed according to at least two sets of cropping parameters based on the initial style image generation model to obtain at least two style image real-time generation models accordingly, and at least two style image real-time generation models are trained to obtain at least two style image real-time generation models. real-time generation models for each target style image, and at least two target style image real-time generation models respectively correspond to different device performance information;
相应的,本公开实施例提供的风格图像生成装置1100还包括:Correspondingly, the style image generating apparatus 1100 provided by the embodiment of the present disclosure further includes:
模型获取模块,用于基于当前设备性能信息,获取与当前设备性能信息相适配的目标风格图像实时生成模型。The model obtaining module is used for obtaining the real-time generation model of the target style image adapted to the current equipment performance information based on the current equipment performance information.
可选的,本公开实施例提供的风格图像生成装置1100还包括:Optionally, the style image generating apparatus 1100 provided by the embodiment of the present disclosure further includes:
人脸识别模块,用于识别原始人脸图像的人脸区域,并确定人脸区域包围框的参数信息,以及人脸区域的旋转角度;The face recognition module is used to identify the face area of the original face image, and determine the parameter information of the bounding box of the face area and the rotation angle of the face area;
人脸位置调整模块,用于基于人脸区域包围框的参数信息,以及人脸区域的旋转角度对人脸区域进行位置调整,获得第一人脸图像,以基于第一人脸图像得到目标风格人脸图像。The face position adjustment module is used to adjust the position of the face area based on the parameter information of the bounding box of the face area and the rotation angle of the face area, and obtain the first face image, so as to obtain the target style based on the first face image. face image.
可选的,人脸位置调整模块包括:Optionally, the face position adjustment module includes:
第一参数获取单元,用于获取预先设置的人脸位置矫正参数值和预设图像尺寸;其中,人脸位置矫正参数值用于矫正人脸区域在位置调整后的图像上的位置;The first parameter obtaining unit is used to obtain the preset face position correction parameter value and the preset image size; wherein, the face position correction parameter value is used to correct the position of the face region on the position-adjusted image;
第一人脸图像确定单元,用于基于人脸区域包围框的参数信息、人脸区域的旋转角度、人脸位置矫正参数值和预设图像尺寸对人脸区域进行位置调整,获得第一人脸图像。The first face image determination unit is used to adjust the position of the face area based on the parameter information of the bounding box of the face area, the rotation angle of the face area, the face position correction parameter value and the preset image size, and obtain the first person face image.
可选的,人脸区域包围框的四个边与原始人脸图像的四个边平行,人脸区域包围框的参数信息包括四个边在原始人脸图像中的位置参数;第一人脸图像确定单元包括:Optionally, the four sides of the face area bounding box are parallel to the four sides of the original face image, and the parameter information of the face area bounding box includes the position parameters of the four sides in the original face image; the first face The image determination unit includes:
第一坐标计算子单元,用于基于人脸区域包围框的四个边对应的处于水平方向上的位置参数,计算人脸区域中心的横坐标值;The first coordinate calculation subunit is used to calculate the abscissa value of the center of the face area based on the position parameters in the horizontal direction corresponding to the four sides of the bounding box of the face area;
第二坐标计算子单元,用于基于人脸区域包围框的四个边对应的处于竖直方向上的位置参数和人脸位置矫正参数值,计算人脸区域中心的纵坐标值;The second coordinate calculation subunit is used to calculate the ordinate value of the center of the face area based on the position parameter in the vertical direction and the face position correction parameter value corresponding to the four sides of the bounding box of the face area;
仿射变换矩阵构建子单元,用于基于人脸区域中心的横坐标值、人脸区域中心的纵坐标值、人脸区域的旋转角度和预设图像尺寸,构建仿射变换矩阵;The affine transformation matrix construction subunit is used to construct an affine transformation matrix based on the abscissa value of the center of the face area, the ordinate value of the center of the face area, the rotation angle of the face area and the preset image size;
第一人脸图像确定子单元,用于基于仿射变换矩阵,对人脸区域进行位置调整,获得第一人脸图像。The first face image determination subunit is used to adjust the position of the face region based on the affine transformation matrix to obtain the first face image.
可选的,人脸位置调整模块还包括:Optionally, the face position adjustment module further includes:
人脸裁剪占比获取单元,用于获取预先设置的人脸裁剪占比;The face cropping ratio acquisition unit is used to obtain the preset face cropping ratio;
人脸区域的边长值确定单元,用于基于人脸裁剪占比和人脸区域包围框的边长值,计算人脸区域的边长值;The side length value determination unit of the face area is used to calculate the side length value of the face area based on the proportion of face cropping and the side length value of the bounding box of the face area;
缩放尺寸值确定单元,用于基于人脸区域的边长值和预设图像尺寸,计算缩放尺寸值。The scaling size value determining unit is used for calculating the scaling size value based on the side length value of the face area and the preset image size.
相应的,仿射变换矩阵构建子单元具体用于:Correspondingly, the affine transformation matrix construction subunit is specifically used for:
基于人脸区域中心的横坐标值、人脸区域中心的纵坐标值、人脸区域的旋转角度、预设图像尺寸和缩放尺寸值,构建仿射变换矩阵。An affine transformation matrix is constructed based on the abscissa value of the center of the face area, the ordinate value of the center of the face area, the rotation angle of the face area, the preset image size and the scaling size value.
可选的,本公开实施例提供的风格图像生成装置1100还包括:Optionally, the style image generating apparatus 1100 provided by the embodiment of the present disclosure further includes:
目标人脸区域获取模块,用于获取目标风格人脸图像中的目标人脸区域;The target face area acquisition module is used to obtain the target face area in the target style face image;
第一风格人脸图像确定模块,用于对目标风格人脸图像中的目标人脸区域进行位置调 整,得到与原始人脸图像中人脸区域位置对应的第一风格人脸图像。The first style face image determination module is used to adjust the position of the target face area in the target style face image, and obtain the first style face image corresponding to the position of the face area in the original face image.
可选的,本公开实施例提供的风格图像生成装置1100还包括:Optionally, the style image generating apparatus 1100 provided by the embodiment of the present disclosure further includes:
第二风格人脸图像确定模块,用于将第一风格人脸图像中的目标人脸区域与目标背景区域进行融合处理,以得到第二风格人脸图像。The second style face image determination module is configured to perform fusion processing on the target face area and the target background area in the first style face image to obtain the second style face image.
本公开实施例所提供的风格图像生成装置可执行本公开实施例所提供的任意风格图像生成方法,具备执行方法相应的功能模块和有益效果。本公开装置实施例中未详尽描述的内容可以参考本公开任意方法实施例中的描述。The style image generating apparatus provided by the embodiment of the present disclosure can execute any style image generating method provided by the embodiment of the present disclosure, and has functional modules and beneficial effects corresponding to the execution method. For the content that is not described in detail in the apparatus embodiment of the present disclosure, reference may be made to the description in any method embodiment of the present disclosure.
图12为本公开实施例提供的一种风格图像生成模型的训练装置的结构示意图,本公开实施例可以适用于如何训练得到满足风格转换需求的风格图像生成模型的情况,该风格图像生成模型用于生成与原始人脸图像对应的风格图像。该训练装置可以采用软件和/或硬件实现,并可集成在任意具有计算能力的电子设备上,例如服务器等。12 is a schematic structural diagram of a training device for a style image generation model provided by an embodiment of the present disclosure. The embodiment of the present disclosure can be applied to the situation of how to train a style image generation model that meets the needs of style conversion. The style image generation model uses It is used to generate style images corresponding to the original face images. The training device can be implemented by software and/or hardware, and can be integrated on any electronic device with computing capability, such as a server.
如图12所示,本公开实施例提供的风格图像生成模型的训练装置1200可以包括样本获取模块1201、第一训练模块1202、模型裁剪模块1203和第二训练模块1204,其中:As shown in FIG. 12 , the training apparatus 1200 for the style image generation model provided by the embodiment of the present disclosure may include a sample acquisition module 1201, a first training module 1202, a model cropping module 1203, and a second training module 1204, wherein:
样本获取模块1201,用于获取多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像;A sample acquisition module 1201, configured to acquire a plurality of original face sample images and a target style face sample image corresponding to each original face sample image;
第一训练模块1202,用于基于多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像,训练得到初始风格图像生成模型;The first training module 1202 is used for training to obtain an initial style image generation model based on a plurality of original face sample images and a target style face sample image corresponding to each original face sample image;
模型裁剪模块1203,用于基于初始风格图像生成模型按照至少一组裁剪参数进行至少一次裁剪操作,以得到至少一个风格图像实时生成模型,所述风格图像实时生成模型随着裁剪参数的变化而变化;The model cropping module 1203 is configured to perform at least one cropping operation according to at least one set of cropping parameters based on the initial style image generation model to obtain at least one style image real-time generation model, which changes with the change of cropping parameters ;
第二训练模块1204,用于基于多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像,对至少一个风格图像实时生成模型进行训练,得到训练后的目标风格图像实时生成模型。The second training module 1204 is configured to train at least one style image real-time generation model based on a plurality of original face sample images and a target style face sample image corresponding to each original face sample image to obtain a trained target Style images generate models in real-time.
可选的,模型裁剪模块1203具体用于:Optionally, the model cropping module 1203 is specifically used for:
对初始风格图像生成模型按照至少两组裁剪参数进行至少二次裁剪操作,以得到第一风格图像实时生成模型和第二风格图像实时生成模型。Perform at least secondary cropping operations on the initial style image generation model according to at least two sets of cropping parameters to obtain a first style image real-time generation model and a second style image real-time generation model.
可选的,模型裁剪模块1203包括:第一裁剪参数获取单元,用于获取初始风格图像生成模型的第一裁剪参数;Optionally, the model cropping module 1203 includes: a first cropping parameter acquisition unit, configured to acquire the first cropping parameter of the initial style image generation model;
第一裁剪单元,用于基于第一裁剪参数,对初始风格图像生成模型进行裁剪,得到第一风格图像实时生成模型;a first cropping unit, configured to crop the initial style image generation model based on the first cropping parameter to obtain the first style image real-time generation model;
第二裁剪参数获取单元,用于获取训练后的第一风格图像实时生成模型的第二裁剪参数;The second cropping parameter obtaining unit is used to obtain the second cropping parameter of the real-time generation model of the first style image after training;
第二裁剪单元,用于基于所述第二裁剪参数,对训练后的第一风格图像实时生成模型进行裁剪,得到第二风格图像实时生成模型。The second cropping unit is configured to crop the trained first style image real-time generation model based on the second cropping parameter to obtain the second style image real-time generation model.
可选的,第一裁剪参数获取单元具体用于:获取初始风格图像生成模型中激活层的第一重要因子;Optionally, the first cropping parameter obtaining unit is specifically configured to: obtain the first important factor of the activation layer in the initial style image generation model;
相应的,第一裁剪单元具体用于:Correspondingly, the first cropping unit is specifically used for:
根据第一重要因子,对初始风格图像生成模型中的激活层以及与该激活层对应的卷积层进行裁剪,得到第一风格图像实时生成模型;According to the first important factor, the activation layer in the initial style image generation model and the convolution layer corresponding to the activation layer are cropped to obtain the first style image real-time generation model;
相应的,第二裁剪参数获取单元具体用于:Correspondingly, the second cropping parameter obtaining unit is specifically used for:
获取训练后的第一风格图像实时生成模型中激活层的第二重要因子;Obtain the second important factor of the activation layer in the real-time generation model of the first style image after training;
相应的,第二裁剪单元具体用于:Correspondingly, the second cropping unit is specifically used for:
基于第二重要因子,对训练后的第一风格图像实时生成模型的激活层以及与该激活层对应的卷积层进行裁剪,得到第二风格图像实时生成模型。Based on the second important factor, the activation layer of the trained first style image real-time generation model and the convolution layer corresponding to the activation layer are cropped to obtain the second style image real-time generation model.
可选的,第一裁剪参数获取单元具体用于:Optionally, the first cropping parameter obtaining unit is specifically used for:
对初始风格图像生成模型中激活层的输出值进行泰勒展开计算,并将计算结果作为第一重要因子;Perform Taylor expansion calculation on the output value of the activation layer in the initial style image generation model, and use the calculation result as the first important factor;
相应的,第二裁剪参数获取子单元具体用于:Correspondingly, the second cropping parameter obtaining subunit is specifically used for:
对训练后的第一风格图像实时生成模型中激活层的输出值进行泰勒展开计算,并将计算结果作为第二重要因子。The Taylor expansion calculation is performed on the output value of the activation layer in the real-time generation model of the first style image after training, and the calculation result is used as the second important factor.
可选的,基于多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像对第一风格图像实时生成模型和第二风格图像实时生成模型进行训练,以分别得到第一目标风格图像实时生成模型和第二目标风格图像实时生成模型,其中,第一目标风格图像实时生成模型和第二目标风格图像实时生成模型分别对应不同的设备性能信息。Optionally, the first style image real-time generation model and the second style image real-time generation model are trained based on a plurality of original face sample images and the target style face sample image corresponding to each original face sample image, so as to respectively. A first target style image real-time generation model and a second target style image real-time generation model are obtained, wherein the first target style image real-time generation model and the second target style image real-time generation model respectively correspond to different device performance information.
可选的,样本获取模块1201包括:Optionally, the sample acquisition module 1201 includes:
原始人脸样本图像获取单元,用于获取多个原始人脸样本图像;an original face sample image acquisition unit, used for acquiring multiple original face sample images;
目标风格人脸样本图像获取单元,用于利用预先训练的目标图像模型,分别获取与每个原始人脸样本图像对应的目标风格人脸样本图像。The target-style face sample image acquisition unit is used for obtaining target-style face sample images corresponding to each original face sample image by using the pre-trained target image model.
可选的,目标图像模型基于利用标准图像生成模型生成的风格人脸样本图像训练得到。标准图像生成模型基于多个标准风格人脸样本图像训练得到。Optionally, the target image model is obtained by training based on the style face sample images generated by the standard image generation model. The standard image generation model is trained on multiple standard-style face sample images.
可选的,本公开实施例提供的风格图像生成模型的训练装置1200还包括:Optionally, the training apparatus 1200 for the style image generation model provided by the embodiment of the present disclosure further includes:
脸部调整模块,用于对目标风格人脸样本图像上的人脸区域进行脸部调整,得到第一风格人脸样本图像,以将多个原始人脸样本图像和得到的多个第一风格人脸样本图像用于训练得到初始风格图像生成模型和训练后的风格图像实时生成模型。The face adjustment module is used to perform face adjustment on the face area on the target style face sample image to obtain the first style face sample image, so as to combine the multiple original face sample images and the obtained multiple first style face images The face sample images are used for training to obtain the initial style image generation model and the trained style image real-time generation model.
可选的,脸部调整包括脸型调整和/或嘴部调整。Optionally, the face adjustment includes face shape adjustment and/or mouth adjustment.
可选的,脸部调整模块包括脸型调整单元,用于对目标风格人脸样本图像上的人脸区域进行脸型调整;Optionally, the face adjustment module includes a face shape adjustment unit for performing face shape adjustment on the face region on the target style face sample image;
其中,脸型调整单元包括:Among them, the face adjustment unit includes:
关键点确定子单元,用于确定目标风格人脸样本图像上人脸区域的初始脸部轮廓关键点,以及与初始脸部轮廓关键点对应的目标脸部轮廓关键点;其中,目标脸部轮廓关键点根据脸型调整需求确定;The key point determination subunit is used to determine the initial face contour key points of the face region on the target-style face sample image, and the target face contour key points corresponding to the initial face contour key points; wherein, the target face contour The key points are determined according to the needs of face adjustment;
脸型调整子单元,用于基于初始脸部轮廓关键点和目标脸部轮廓关键点,对目标风格人脸样本图像上人脸区域的脸部轮廓进行调整,以得到第一风格人脸样本图像。The face shape adjustment subunit is used to adjust the face contour of the face region on the target style face sample image based on the initial face contour key points and the target face contour key points to obtain the first style face sample image.
可选的,脸型调整子单元包括:Optionally, the face shape adjustment subunit includes:
关键点移动子单元,用于将初始脸部轮廓关键点移动至目标脸部轮廓关键点,并利用薄板样条插值函数对目标风格人脸样本图像上的人脸区域进行形变处理;The key point moving subunit is used to move the initial face contour key point to the target face contour key point, and use the thin plate spline interpolation function to deform the face area on the target style face sample image;
图像渲染子单元,用于利用目标风格人脸样本图像的人脸纹理渲染形变处理后的人脸区域,以得到第一风格人脸样本图像。The image rendering subunit is used for rendering the deformed face region by using the face texture of the target style face sample image, so as to obtain the first style face sample image.
可选的,脸部调整模块包括嘴部调整单元,用于对目标风格人脸样本图像上的人脸区域进行嘴部调整;Optionally, the face adjustment module includes a mouth adjustment unit for performing mouth adjustment on the face region on the target-style face sample image;
其中,嘴部调整单元包括:Among them, the mouth adjustment unit includes:
嘴部关键点确定子单元,用于确定目标风格人脸样本图像上人脸区域的嘴部关键点;The mouth key point determination subunit is used to determine the mouth key points of the face area on the target style face sample image;
残缺风格人脸样本图像确定子单元,用于将基于嘴部关键点确定的嘴部从目标风格人脸样本图像的人脸区域中移除,得到残缺风格人脸样本图像;The incomplete-style face sample image determination subunit is used to remove the mouth determined based on the key points of the mouth from the face area of the target-style face sample image to obtain the incomplete-style face sample image;
第一风格人脸样本图像确定子单元,用于将预先生成的嘴部素材与残缺风格人脸样本图像进行融合,以得到第一风格人脸样本图像。The first style face sample image determination subunit is used for fusing the pre-generated mouth material with the incomplete style face sample image to obtain the first style face sample image.
可选的,残缺风格人脸样本图像确定子单元包括:Optionally, the subunit for determining the incomplete-style face sample image includes:
子区域确定子单元,用于基于嘴部关键点,在目标风格人脸样本图像的人脸区域中确定包围嘴部的子区域;The sub-region determination sub-unit is used to determine the sub-region surrounding the mouth in the face region of the target-style face sample image based on the key points of the mouth;
嘴部边界确定子单元,用于利用固定边界求解算法,确定子区域中的嘴部边界线;The mouth boundary determination subunit is used to determine the mouth boundary line in the subregion by using the fixed boundary solution algorithm;
嘴部移除子单元,用于基于嘴部边界线,将嘴部从目标风格人脸样本图像的人脸区域中移除,得到残缺风格人脸样本图像。The mouth removal subunit is used to remove the mouth from the face area of the target style face sample image based on the mouth boundary line to obtain the incomplete style face sample image.
可选的,第一风格人脸样本图像确定子单元包括:Optionally, the first style face sample image determination subunit includes:
关键点对齐和形变子单元,用于将嘴部素材上标注的关键点与目标风格人脸样本图像的人脸区域中嘴部关键点进行对齐处理,并基于薄板样条插值函数对嘴部素材进行形变处理;The key point alignment and deformation sub-unit is used to align the key points marked on the mouth material with the key points of the mouth in the face area of the target style face sample image, and based on the thin-plate spline interpolation function, the mouth material is processed. deformation processing;
图像渲染子单元,用于利用目标风格人脸样本图像的嘴部纹理渲染形变处理后的嘴部素材。The image rendering subunit is used to render the deformed mouth material using the mouth texture of the target-style face sample image.
可选的,第一风格人脸样本图像确定子单元还包括:Optionally, the first style face sample image determination subunit further includes:
内外边界确定子单元,用于确定嘴部素材上嘴部轮廓的内边界线和外边界线;The inner and outer boundary determination subunits are used to determine the inner and outer boundary lines of the mouth contour on the mouth material;
关键点对齐和形变子单元,包括:Keypoint alignment and deformation subunits, including:
关键点对齐子单元,用于基于嘴部素材上标注的关键点,与目标风格人脸样本图像的人脸区域中嘴部关键点进行对齐处理;The key point alignment sub-unit is used to align the key points of the mouth in the face region of the target-style face sample image based on the key points marked on the mouth material;
第一形变子单元,用于利用薄板样条插值函数对嘴部素材内边界线以内的区域进行形变处理;The first deformation subunit is used to deform the area within the inner boundary line of the mouth material by using the thin-plate spline interpolation function;
第二形变子单元,用于利用求解最优化算法对内边界线和外边界线之间的区域进行形变处理。The second deformation subunit is used to perform deformation processing on the area between the inner boundary line and the outer boundary line by using the solution optimization algorithm.
可选的,本公开实施例提供的风格图像生成模型的训练装置1200还包括:Optionally, the training apparatus 1200 for the style image generation model provided by the embodiment of the present disclosure further includes:
人脸识别模块,用于识别原始人脸样本图像的人脸区域,并确定人脸区域包围框的参数信息,以及人脸区域的旋转角度;The face recognition module is used to identify the face area of the original face sample image, and to determine the parameter information of the bounding box of the face area and the rotation angle of the face area;
第一人脸样本图像确定模块,用于基于人脸区域包围框的参数信息,以及人脸区域的 旋转角度对人脸区域进行位置调整,获得第一人脸样本图像。The first face sample image determination module is used to adjust the position of the face area based on the parameter information of the bounding box of the face area and the rotation angle of the face area to obtain the first face sample image.
目标风格人脸样本图像获取单元,具体用于基于多个第一人脸样本图像,利用预先训练的目标图像模型,获取与每个原始人脸样本图像对应的目标风格人脸样本图像。The target-style face sample image acquisition unit is specifically configured to obtain a target-style face sample image corresponding to each original face sample image by using a pre-trained target image model based on a plurality of first face sample images.
可选的,第一人脸样本图像确定模块包括:Optionally, the first face sample image determination module includes:
第一参数获取单元,用于获取预先设置的人脸位置矫正参数值和预设图像尺寸;其中,人脸位置矫正参数值用于矫正人脸区域在位置调整后的图像上的位置;The first parameter obtaining unit is used to obtain the preset face position correction parameter value and the preset image size; wherein, the face position correction parameter value is used to correct the position of the face region on the position-adjusted image;
第一人脸样本图像确定单元,用于基于人脸区域包围框的参数信息、人脸区域的旋转角度、人脸位置矫正参数值和预设图像尺寸,对人脸区域进行位置调整,获得第一人脸样本图像。The first face sample image determination unit is used to adjust the position of the face area based on the parameter information of the bounding box of the face area, the rotation angle of the face area, the value of the face position correction parameter and the preset image size, and obtain the first A sample image of a person's face.
可选的,人脸区域包围框的四个边与原始人脸样本图像的四个边平行,人脸区域包围框的参数信息包括四个边在原始人脸样本图像中的位置参数;Optionally, the four sides of the face area bounding box are parallel to the four sides of the original face sample image, and the parameter information of the face area bounding box includes the position parameters of the four sides in the original face sample image;
第一人脸样本图像确定单元包括:The first face sample image determination unit includes:
第一坐标计算子单元,用于基于人脸区域包围框的四个边对应的处于水平方向上的位置参数,计算人脸区域中心的横坐标值;The first coordinate calculation subunit is used to calculate the abscissa value of the center of the face area based on the position parameters in the horizontal direction corresponding to the four sides of the bounding box of the face area;
第二坐标计算子单元,用于基于人脸区域包围框的四个边对应的处于竖直方向上的位置参数和人脸位置矫正参数值,计算人脸区域中心的纵坐标值;The second coordinate calculation subunit is used to calculate the ordinate value of the center of the face area based on the position parameter in the vertical direction and the face position correction parameter value corresponding to the four sides of the bounding box of the face area;
仿射变换矩阵构建子单元,用于基于人脸区域中心的横坐标值、人脸区域中心的纵坐标值、人脸区域的旋转角度和预设图像尺寸,构建仿射变换矩阵;The affine transformation matrix construction subunit is used to construct an affine transformation matrix based on the abscissa value of the center of the face area, the ordinate value of the center of the face area, the rotation angle of the face area and the preset image size;
位置调整子单元,用于基于仿射变换矩阵,对人脸区域进行位置调整,获得第一人脸样本图像。The position adjustment subunit is used to adjust the position of the face region based on the affine transformation matrix to obtain the first face sample image.
可选的,第一人脸样本图像确定模块还包括:Optionally, the first face sample image determination module further includes:
人脸裁剪占比获取单元,用于获取预先设置的人脸裁剪占比;The face cropping ratio acquisition unit is used to obtain the preset face cropping ratio;
人脸区域的边长值确定单元,用于基于人脸裁剪占比和人脸区域包围框的边长值,计算人脸区域的边长值;The side length value determination unit of the face area is used to calculate the side length value of the face area based on the proportion of face cropping and the side length value of the bounding box of the face area;
缩放尺寸值确定单元,用于基于人脸区域的边长值和预设图像尺寸,计算缩放尺寸值;A scaling size value determination unit, used for calculating the scaling size value based on the side length value of the face area and the preset image size;
相应的,仿射变换矩阵构建子单元具体用于:Correspondingly, the affine transformation matrix construction subunit is specifically used for:
基于人脸区域中心的横坐标值、人脸区域中心的纵坐标值、人脸区域的旋转角度、预设图像尺寸和缩放尺寸值,构建仿射变换矩阵。An affine transformation matrix is constructed based on the abscissa value of the center of the face area, the ordinate value of the center of the face area, the rotation angle of the face area, the preset image size and the scaling size value.
本公开实施例所提供的风格图像生成模型的训练装置可执行本公开实施例所提供的任意风格图像生成模型的训练方法,具备执行方法相应的功能模块和有益效果。本公开装置实施例中未详尽描述的内容可以参考本公开任意方法实施例中的描述。The apparatus for training a style image generation model provided by the embodiment of the present disclosure can execute the training method for an arbitrary style image generation model provided by the embodiment of the present disclosure, and has functional modules and beneficial effects corresponding to the execution method. For the content that is not described in detail in the apparatus embodiment of the present disclosure, reference may be made to the description in any method embodiment of the present disclosure.
需要说明的是,在本公开实施例中,针对风格图像生成装置和风格图像生成模型的训练装置,存在部分名称相同的模块或者单元,但是本领域技术人员可以理解,针对不同的图像处理阶段,模块或单元的具体功能应结合具体的图像处理阶段理解,而不能脱离具体的图像处理阶段,将模块或单元的实现功能混淆。It should be noted that, in the embodiments of the present disclosure, for the style image generation device and the style image generation model training device, there are some modules or units with the same name, but those skilled in the art can understand that for different image processing stages, The specific functions of a module or unit should be understood in conjunction with the specific image processing stage, and cannot be separated from the specific image processing stage to confuse the realization function of the module or unit.
图13为本公开实施例提供的一种电子设备的结构示意图,用于对本公开示例中用于执行风格图像生成方法或者用于执行风格图像生成模型的训练方法的电子设备,进行示例性 说明。本公开实施例中的电子设备可以包括但不限于诸如移动电话、笔记本电脑、数字广播接收器、PDA(个人数字助理)、PAD(平板电脑)、PMP(便携式多媒体播放器)、车载终端(例如车载导航终端)等等的移动终端以及诸如数字TV、台式计算机等等的固定终端。图13示出的电子设备仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。FIG. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure, which is used to exemplarily illustrate the electronic device for executing a style image generation method or a training method for a style image generation model in an example of the present disclosure. The electronic devices in the embodiments of the present disclosure may include, but are not limited to, such as mobile phones, notebook computers, digital broadcast receivers, PDAs (personal digital assistants), PADs (tablets), PMPs (portable multimedia players), vehicle-mounted terminals (eg, mobile terminals such as in-vehicle navigation terminals), etc., and stationary terminals such as digital TVs, desktop computers, and the like. The electronic device shown in FIG. 13 is only an example, and should not impose any limitation on the function and scope of use of the embodiments of the present disclosure.
如图13所示,电子设备1300可以包括处理装置(例如中央处理器、图形处理器等)1301,其可以根据存储在只读存储器(ROM)1302中的程序或者从存储装置1308加载到随机访问存储器(RAM)1303中的程序而执行各种适当的动作和处理。在RAM 1303中,还存储有电子设备1300操作所需的各种程序和数据。处理装置1301、ROM 1302以及RAM 1303通过总线1304彼此相连。输入/输出(I/O)接口1305也连接至总线1304。图13中显示的ROM 1302、RAM 1303和存储装置1308,可以统称为用于存储处理装置1301可执行指令或程序的存储器。As shown in FIG. 13 , electronic device 1300 may include processing means (eg, central processing unit, graphics processor, etc.) 1301 that may be loaded into random access according to a program stored in read only memory (ROM) 1302 or from storage means 1308 Various appropriate actions and processes are executed by the programs in the memory (RAM) 1303 . In the RAM 1303, various programs and data necessary for the operation of the electronic device 1300 are also stored. The processing device 1301, the ROM 1302, and the RAM 1303 are connected to each other through a bus 1304. An input/output (I/O) interface 1305 is also connected to bus 1304 . The ROM 1302, RAM 1303 and storage device 1308 shown in FIG. 13 may be collectively referred to as a memory for storing executable instructions or programs of the processing device 1301.
通常,以下装置可以连接至I/O接口1305:包括例如触摸屏、触摸板、键盘、鼠标、摄像头、麦克风、加速度计、陀螺仪等的输入装置1306;包括例如液晶显示器(LCD)、扬声器、振动器等的输出装置1307;包括例如磁带、硬盘等的存储装置1308;以及通信装置1309。通信装置1309可以允许电子设备1300与其他设备进行无线或有线通信以交换数据。虽然图13示出了具有各种装置的电子设备1300,但是应理解的是,并不要求实施或具备所有示出的装置。可以替代地实施或具备更多或更少的装置。Generally, the following devices can be connected to the I/O interface 1305: input devices 1306 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; including, for example, a liquid crystal display (LCD), speakers, vibration An output device 1307 of a computer, etc.; a storage device 1308 including, for example, a magnetic tape, a hard disk, etc.; and a communication device 1309. Communication means 1309 may allow electronic device 1300 to communicate wirelessly or by wire with other devices to exchange data. Although FIG. 13 shows an electronic device 1300 having various means, it should be understood that not all of the illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在非暂态计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信装置1309从网络上被下载和安装,或者从存储装置1308被安装,或者从ROM 1302被安装。在该计算机程序被处理装置1301执行时,执行本公开实施例的方法中限定的上述功能。In particular, according to embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated in the flowchart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication device 1309, or from the storage device 1308, or from the ROM 1302. When the computer program is executed by the processing device 1301, the above-mentioned functions defined in the methods of the embodiments of the present disclosure are executed.
需要说明的是,本公开上述的计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质或者是上述两者的任意组合。计算机可读存储介质例如可以是但不限于电、磁、光、电磁、红外线、或半导体的***、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开实施例中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行***、装置或者器件使用或者与其结合使用。而在本公开实施例中,计算机可读信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读信号介质可以发送、传播或者传输用于由指令执行***、装置或者器件使用或者与其结合使用的程序。 计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:电线、光缆、RF(射频)等等,或者上述的任意合适的组合。It should be noted that the computer-readable medium mentioned above in the present disclosure may be a computer-readable signal medium or a computer-readable storage medium, or any combination of the above two. The computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus or device, or a combination of any of the above. More specific examples of computer readable storage media may include, but are not limited to, electrical connections with one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Programmable read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing. In embodiments of the present disclosure, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device. However, in embodiments of the present disclosure, a computer-readable signal medium may include a data signal in baseband or propagated as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium can also be any computer-readable medium other than a computer-readable storage medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device . Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, electrical wire, optical fiber cable, RF (radio frequency), etc., or any suitable combination of the foregoing.
在一些实施方式中,客户端、服务器可以利用诸如HTTP(HyperText Transfer Protocol,超文本传输协议)之类的任何当前已知或未来研发的网络协议进行通信,并且可以与任意形式或介质的数字数据通信(例如,通信网络)互连。通信网络的示例包括局域网(LAN),广域网(WAN),网际网(例如,互联网)以及端对端网络(例如,ad hoc端对端网络),以及任何当前已知或未来研发的网络。In some embodiments, the client and server can use any currently known or future developed network protocol such as HTTP (HyperText Transfer Protocol) to communicate, and can communicate with digital data in any form or medium Communication (eg, a communication network) interconnects. Examples of communication networks include local area networks (LANs), wide area networks (WANs), the Internet (eg, the Internet), and peer-to-peer networks (eg, ad hoc peer-to-peer networks), as well as any currently known or future developed networks.
上述计算机可读介质可以是上述电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。The above-mentioned computer-readable medium may be included in the above-mentioned electronic device; or may exist alone without being assembled into the electronic device.
根据本公开实施例的计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取原始人脸图像;利用预先训练的目标风格图像实时生成模型,得到与所述原始人脸图像对应的目标风格人脸图像;其中,所述目标风格图像实时生成模型是在训练得到初始风格图像生成模型后,对所述初始风格图像生成模型按照至少一组裁剪参数进行至少一次裁剪操作得到,并且所述初始风格图像生成模型和所述目标风格图像实时生成模型均基于多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像训练得到,其中,所述风格图像实时生成模型随着所述裁剪参数的变化而变化。The computer-readable medium according to the embodiment of the present disclosure carries one or more programs, and when the one or more programs are executed by the electronic device, the electronic device causes the electronic device to: acquire an original face image; use a pre-trained target style image A model is generated in real time, and a target style face image corresponding to the original face image is obtained; wherein, the target style image real-time generation model is obtained after training to obtain an initial style image generation model. At least one set of cropping parameters is obtained by performing at least one cropping operation, and the initial style image generation model and the target style image real-time generation model are both based on a plurality of original face sample images and a target corresponding to each original face sample image. The style face sample image is obtained by training, wherein the real-time generation model of the style image changes with the change of the cropping parameter.
或者,根据本公开实施例的计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:获取多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像;基于所述多个原始人脸样本图像和所述与每个原始人脸样本图像对应的目标风格人脸样本图像,训练得到初始风格图像生成模型;对所述初始风格图像生成模型按照至少一组裁剪参数进行至少一次裁剪操作,以得到风格图像实时生成模型,所述风格图像实时生成模型随着所述裁剪参数的变化而变化;基于所述多个原始人脸样本图像和所述与每个原始人脸样本图像对应的目标风格人脸样本图像,对所述风格图像实时生成模型进行训练,得到训练后的目标风格图像实时生成模型。Alternatively, a computer-readable medium according to an embodiment of the present disclosure carries one or more programs, and when the one or more programs are executed by the electronic device, the electronic device causes the electronic device to: acquire a plurality of original face sample images and compare them with each target style face sample images corresponding to the original face sample images; based on the multiple original face sample images and the target style face sample images corresponding to each original face sample image, the initial style image is obtained by training generating a model; performing at least one cropping operation on the initial style image generation model according to at least one set of cropping parameters to obtain a style image real-time generation model, and the style image real-time generation model changes with the change of the cropping parameters; based on The multiple original face sample images and the target style face sample images corresponding to each original face sample image are trained on the style image real-time generation model to obtain a trained target style image real-time generation model .
需要说明的是,应当理解,计算机可读介质中存储的一个或者多个程序被该电子设备执行时,还可以使得该电子设备执行本公开实例提供的其它风格图像生成方法或者其它风格图像生成模型的训练方法。It should be noted that, it should be understood that when one or more programs stored in the computer-readable medium are executed by the electronic device, the electronic device can also be made to execute other style image generation methods or other style image generation models provided by the examples of the present disclosure training method.
在本公开实施例中,可以以一种或多种程序设计语言或其组合来编写用于执行本公开的操作的计算机程序代码,上述程序设计语言包括但不限于面向对象的程序设计语言,诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言,诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络:包括局域网(LAN)或广域网(WAN),连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。In embodiments of the present disclosure, computer program code for performing operations of the present disclosure may be written in one or more programming languages, including but not limited to object-oriented programming languages, such as Java, Smalltalk, C++, and also conventional procedural programming languages, such as the "C" language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider to connect).
附图中的流程图和框图,图示了按照本公开各种实施例的***、方法和计算机程序产 品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的***来实现,或者可以用专用硬件与计算机指令的组合来实现。The flowchart and block diagrams in the Figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions. It should also be noted that, in some alternative implementations, the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It is also noted that each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations, can be implemented in dedicated hardware-based systems that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
描述于本公开实施例中所涉及到的模块或单元可以通过软件的方式实现,也可以通过硬件的方式来实现。其中,模块或单元的名称在某种情况下并不构成对该模块或单元本身的限定,例如,原始人脸图像获取模块,还可以被描述为“用于获取原始人脸图像的模块”。The modules or units involved in the embodiments of the present disclosure may be implemented in software or hardware. Wherein, the name of the module or unit does not constitute a limitation of the module or unit itself in some cases, for example, the original face image acquisition module can also be described as "a module for acquiring original face images".
本文中以上描述的功能可以至少部分地由一个或多个硬件逻辑部件来执行。例如,非限制性地,可以使用的示范类型的硬件逻辑部件包括:现场可编程门阵列(FPGA)、专用集成电路(ASIC)、专用标准产品(ASSP)、片上***(SOC)、复杂可编程逻辑设备(CPLD)等等。The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), Systems on Chips (SOCs), Complex Programmable Logical Devices (CPLDs) and more.
在本公开的上下文中,机器可读介质可以是有形的介质,其可以包含或存储以供指令执行***、装置或设备使用或与指令执行***、装置或设备结合地使用的程序。机器可读介质可以是机器可读信号介质或机器可读储存介质。机器可读介质可以包括但不限于电子的、磁性的、光学的、电磁的、红外的、或半导体***、装置或设备,或者上述内容的任何合适组合。机器可读存储介质的更具体示例会包括基于一个或多个线的电气连接、便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦除可编程只读存储器(EPROM或快闪存储器)、光纤、便捷式紧凑盘只读存储器(CD-ROM)、光学储存设备、磁储存设备、或上述内容的任何合适组合。In the context of the present disclosure, a machine-readable medium may be a tangible medium that may contain or store a program for use by or in connection with the instruction execution system, apparatus or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. Machine-readable media may include, but are not limited to, electronic, magnetic, optical, electromagnetic, infrared, or semiconductor systems, devices, or devices, or any suitable combination of the foregoing. More specific examples of machine-readable storage media would include one or more wire-based electrical connections, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable read only memory (EPROM or flash memory), fiber optics, compact disk read only memory (CD-ROM), optical storage, magnetic storage, or any suitable combination of the foregoing.
需要说明的是,在本文中,诸如“第一”和“第二”等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括要素的过程、方法、物品或者设备中还存在另外的相同要素。It should be noted that, in this document, relational terms such as "first" and "second" etc. are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply these There is no such actual relationship or sequence between entities or operations. Moreover, the terms "comprising", "comprising" or any other variation thereof are intended to encompass a non-exclusive inclusion such that a process, method, article or device that includes a list of elements includes not only those elements, but also includes not explicitly listed or other elements inherent to such a process, method, article or apparatus. Without further limitation, an element qualified by the phrase "comprising a..." does not preclude the presence of additional identical elements in the process, method, article, or device that includes the element.
以上仅是本公开的具体实施方式,使本领域技术人员能够理解或实现本公开。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的,本文中所定义的一般原理可以在不脱离本公开的精神或范围的情况下,在其它实施例中实现。因此,本公开将不会被限制于本文的这些实施例,而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above are only specific embodiments of the present disclosure, so that those skilled in the art can understand or implement the present disclosure. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the present disclosure. Therefore, the present disclosure is not to be limited to the embodiments herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (27)

  1. 一种风格图像生成方法,其特征在于,包括:A style image generation method, comprising:
    获取原始人脸图像;Get the original face image;
    利用预先训练的目标风格图像实时生成模型,得到与所述原始人脸图像对应的目标风格人脸图像;Utilize the pre-trained target style image to generate the model in real time to obtain the target style face image corresponding to the original face image;
    其中,所述目标风格图像实时生成模型是在训练得到初始风格图像生成模型后,对基于所述初始风格图像生成模型按照至少一组裁剪参数进行至少一次裁剪操作得到的至少一个风格图像实时生成模型训练得到,并且所述初始风格图像生成模型和所述目标风格图像实时生成模型均基于多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像训练得到,其中,所述风格图像实时生成模型随着所述裁剪参数的变化而变化。The target style image real-time generation model is a real-time generation model of at least one style image obtained by at least one cropping operation based on the initial style image generation model according to at least one set of cropping parameters after the initial style image generation model is obtained by training. obtained by training, and the initial style image generation model and the target style image real-time generation model are both obtained by training based on multiple original face sample images and target style face sample images corresponding to each original face sample image, wherein , the real-time generation model of the style image changes with the change of the cropping parameter.
  2. 根据权利要求1所述的方法,其特征在于,所述多个原始人脸样本图像和所述与每个原始人脸样本图像对应的目标风格人脸样本图像分别为预先训练的目标图像模型的输入和输出,所述目标图像模型用于生成原始人脸样本图像对应的目标风格人脸样本图像,以为所述初始风格图像生成模型和所述目标风格图像生成模型提供训练样本。The method according to claim 1, wherein the plurality of original face sample images and the target-style face sample image corresponding to each original face sample image are images of a pre-trained target image model, respectively. Input and output, the target image model is used to generate a target style face sample image corresponding to the original face sample image, so as to provide training samples for the initial style image generation model and the target style image generation model.
  3. 根据权利要求1所述的方法,其特征在于,基于所述初始风格图像生成模型按照至少两组裁剪参数进行至少两次裁剪操作,以相应地得到至少两个风格图像实时生成模型,对所述至少两个风格图像实时生成模型进行训练得到至少两个目标风格图像实时生成模型,并且,所述至少两个目标风格图像实时生成模型分别对应不同的设备性能信息;The method according to claim 1, characterized in that, based on the initial style image generation model, at least two cropping operations are performed according to at least two sets of cropping parameters, so as to obtain at least two style image real-time generation models correspondingly. At least two real-time generation models of style images are trained to obtain at least two real-time generation models of target style images, and the at least two real-time generation models of target style images correspond to different equipment performance information respectively;
    相应的,在所述利用预先训练的目标风格图像实时生成模型,得到与所述原始人脸图像对应的目标风格人脸图像之前,还包括:Correspondingly, before the target-style face image corresponding to the original face image is obtained by using the pre-trained target style image to generate the model in real time, the method further includes:
    基于当前设备性能信息,获取与所述当前设备性能信息相适配的目标风格图像实时生成模型。Based on the current device performance information, a target style image real-time generation model adapted to the current device performance information is acquired.
  4. 根据权利要求1至3任一项所述的方法,其特征在于,在所述获取原始人脸图像之后,还包括:The method according to any one of claims 1 to 3, wherein after the acquiring the original face image, the method further comprises:
    识别所述原始人脸图像的人脸区域,并确定人脸区域包围框的参数信息,以及所述人脸区域的旋转角度;Identify the face area of the original face image, and determine the parameter information of the face area bounding box, and the rotation angle of the face area;
    基于所述人脸区域包围框的参数信息,以及所述人脸区域的旋转角度对所述人脸区域进行位置调整,获得第一人脸图像,以基于所述第一人脸图像得到所述目标风格人脸图像。The position of the face region is adjusted based on the parameter information of the bounding box of the face region and the rotation angle of the face region to obtain a first face image, so as to obtain the first face image based on the first face image. Target style face image.
  5. 根据权利要求4所述的方法,其特征在于,还包括:The method of claim 4, further comprising:
    获取所述目标风格人脸图像中的目标人脸区域;obtaining the target face region in the target style face image;
    对所述目标风格人脸图像中的目标人脸区域进行位置调整,得到与所述原始人脸图像中人脸区域位置对应的第一风格人脸图像。The position of the target face region in the target style face image is adjusted to obtain a first style face image corresponding to the position of the face region in the original face image.
  6. 根据权利要求5所述的方法,其特征在于,还包括:The method of claim 5, further comprising:
    将所述第一风格人脸图像中的目标人脸区域与目标背景区域进行融合处理,以得到第二风格人脸图像。The target face area and the target background area in the first style face image are fused to obtain the second style face image.
  7. 一种风格图像生成模型的训练方法,其特征在于,包括:A method for training a style image generation model, comprising:
    获取多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像;acquiring a plurality of original face sample images and a target style face sample image corresponding to each original face sample image;
    基于所述多个原始人脸样本图像和所述与每个原始人脸样本图像对应的目标风格人脸样本图像,训练得到初始风格图像生成模型;Based on the plurality of original face sample images and the target style face sample image corresponding to each original face sample image, an initial style image generation model is obtained by training;
    基于所述初始风格图像生成模型按照至少一组裁剪参数进行至少一次裁剪操作,以得到至少一个风格图像实时生成模型,其中,所述风格图像实时生成模型随着所述裁剪参数的变化而变化;Perform at least one cropping operation based on the initial style image generation model according to at least one set of cropping parameters to obtain at least one style image real-time generation model, wherein the style image real-time generation model changes with the change of the cropping parameters;
    基于所述多个原始人脸样本图像和所述与每个原始人脸样本图像对应的目标风格人脸样本图像,对所述至少一个风格图像实时生成模型进行训练,得到训练后的目标风格图像实时生成模型。Based on the plurality of original face sample images and the target style face sample image corresponding to each original face sample image, the at least one style image real-time generation model is trained to obtain a trained target style image Generate models in real time.
  8. 根据权利要求7所述的方法,其特征在于,所述基于所述初始风格图像生成模型按照至少一组裁剪参数进行至少一次裁剪操作,以得到至少一个风格图像实时生成模型,包括:The method according to claim 7, wherein, performing at least one cropping operation based on the initial style image generation model according to at least one set of cropping parameters to obtain at least one style image real-time generation model, comprising:
    基于所述初始风格图像生成模型按照至少两组裁剪参数进行至少二次裁剪操作,以得到第一风格图像实时生成模型和第二风格图像实时生成模型。Based on the initial style image generation model, at least two cropping operations are performed according to at least two sets of cropping parameters to obtain a first style image real-time generation model and a second style image real-time generation model.
  9. 根据权利要求8所述的方法,其特征在于,所述基于所述初始风格图像生成模型按照至少两组裁剪参数进行至少二次裁剪操作,以得到第一风格图像实时生成模型和第二风格图像实时生成模型,包括:The method according to claim 8, wherein the generating model based on the initial style image performs at least two cropping operations according to at least two sets of cropping parameters, so as to obtain a real-time generation model of the first style image and a second style image Generate models in real-time, including:
    获取所述初始风格图像生成模型的第一裁剪参数,所述第一裁剪参数为所述至少两组裁剪参数中的一组;acquiring a first cropping parameter of the initial style image generation model, where the first cropping parameter is one of the at least two groups of cropping parameters;
    基于所述第一裁剪参数,对所述初始风格图像生成模型进行裁剪,得到所述第一风格图像实时生成模型;Based on the first cropping parameters, crop the initial style image generation model to obtain the first style image real-time generation model;
    获取训练后的第一风格图像实时生成模型的第二裁剪参数,所述第二裁剪参数为所述至少两组裁剪参数中的另一组;Obtaining the second cropping parameter of the trained first style image real-time generation model, where the second cropping parameter is another group of the at least two groups of cropping parameters;
    基于所述第二裁剪参数,对所述训练后的第一风格图像实时生成模型进行裁剪,得到所述第二风格图像实时生成模型。Based on the second cropping parameters, the trained first style image real-time generation model is cropped to obtain the second style image real-time generation model.
  10. 根据权利要求9所述的方法,其特征在于,所述获取所述初始风格图像生成模型的第一裁剪参数,包括:The method according to claim 9, wherein the acquiring the first cropping parameter of the initial style image generation model comprises:
    获取所述初始风格图像生成模型中激活层的第一重要因子;obtaining the first important factor of the activation layer in the initial style image generation model;
    相应的,所述基于所述第一裁剪参数,对所述初始风格图像生成模型进行裁剪,得到第一风格图像实时生成模型,包括:Correspondingly, according to the first cropping parameter, the initial style image generation model is cropped to obtain the first style image real-time generation model, including:
    根据所述第一重要因子,对所述初始风格图像生成模型中的激活层以及与该激活层对应的卷积层进行裁剪,得到所述第一风格图像实时生成模型;According to the first important factor, the activation layer in the initial style image generation model and the convolution layer corresponding to the activation layer are cropped to obtain the first style image real-time generation model;
    相应的,所述获取训练后的第一风格图像实时生成模型的第二裁剪参数,包括:Correspondingly, the second cropping parameters of the real-time generation model of the acquired first style image after training include:
    获取训练后的第一风格图像实时生成模型中激活层的第二重要因子;Obtain the second important factor of the activation layer in the real-time generation model of the first style image after training;
    相应的,所述基于所述第二裁剪参数,对所述训练后的第一风格图像实时生成模型进行裁剪,得到第二风格图像实时生成模型,包括:Correspondingly, according to the second cropping parameter, the first style image real-time generation model after training is cropped to obtain the second style image real-time generation model, including:
    基于所述第二重要因子,对所述训练后的第一风格图像实时生成模型的激活层以及与该激活层对应的卷积层进行裁剪,得到所述第二风格图像实时生成模型。Based on the second important factor, the activation layer of the trained first style image real-time generation model and the convolution layer corresponding to the activation layer are cropped to obtain the second style image real-time generation model.
  11. 根据权利要求10所述的方法,其特征在于,所述获取所述初始风格图像生成模型中激活层的第一重要因子,包括:The method according to claim 10, wherein the obtaining the first important factor of the activation layer in the initial style image generation model comprises:
    对所述初始风格图像生成模型中激活层的输出值进行泰勒展开计算,并将计算结果作为所述第一重要因子;Perform Taylor expansion calculation on the output value of the activation layer in the initial style image generation model, and use the calculation result as the first important factor;
    相应的,所述获取所述训练后的第一风格图像实时生成模型中激活层的第二重要因子,包括:Correspondingly, obtaining the second important factor of the activation layer in the real-time generation model of the first style image after training includes:
    对所述训练后的第一风格图像实时生成模型中激活层的输出值进行泰勒展开计算,并将计算结果作为所述第二重要因子。Perform Taylor expansion calculation on the output value of the activation layer in the real-time generation model of the first style image after training, and use the calculation result as the second important factor.
  12. 根据权利要求9所述的方法,其特征在于,基于所述多个原始人脸样本图像和所述与每个原始人脸样本图像对应的目标风格人脸样本图像对所述第一风格图像实时生成模型和所述第二风格图像实时生成模型进行训练,以分别得到第一目标风格图像实时生成模型和第二目标风格图像实时生成模型,其中,所述第一目标风格图像实时生成模型和第二目标风格图像实时生成模型分别对应不同的设备性能信息。The method according to claim 9, wherein, based on the plurality of original face sample images and the target style face sample image corresponding to each original face sample image, the first style image is processed in real time The generation model and the second style image real-time generation model are trained to obtain the first target style image real-time generation model and the second target style image real-time generation model, wherein the first target style image real-time generation model and the second target style image real-time generation model are trained. The two target style image real-time generation models correspond to different device performance information respectively.
  13. 根据权利要求7所述的方法,其特征在于,所述获取多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像,包括:The method according to claim 7, wherein the acquiring a plurality of original face sample images and a target-style face sample image corresponding to each original face sample image comprises:
    获取多个原始人脸样本图像;Obtain multiple original face sample images;
    利用预先训练的目标图像模型,分别获取所述与每个原始人脸样本图像对应的目标风格人脸样本图像,所述目标图像模型用于生成原始人脸样本图像对应的目标风格人脸样本图像,以为所述初始风格图像生成模型和所述目标风格图像生成模型提供训练样本。Using the pre-trained target image model, the target-style face sample images corresponding to each original face sample image are respectively obtained, and the target image model is used to generate the target-style face sample images corresponding to the original face sample images. , to provide training samples for the initial style image generation model and the target style image generation model.
  14. 根据权利要求13所述的方法,其特征在于,所述目标图像模型基于利用标准图像生成模型生成的风格人脸样本图像训练得到,所述图像生成模型基于多个标准风格人脸样本图像训练得到。The method according to claim 13, wherein the target image model is obtained by training based on the style face sample images generated by using a standard image generation model, and the image generation model is obtained by training based on a plurality of standard style face sample images. .
  15. 根据权利要求7所述的方法,其特征在于,在所述获取多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像之后,还包括:The method according to claim 7, wherein after the acquiring a plurality of original face sample images and a target style face sample image corresponding to each original face sample image, the method further comprises:
    对所述目标风格人脸样本图像上的人脸区域进行脸部调整,得到第一风格人脸样本图像,以将所述多个原始人脸样本图像和得到的多个第一风格人脸样本图像用于训练得到所述初始风格图像生成模型和所述训练后的风格图像实时生成模型。Perform face adjustment on the face area on the target style face sample image to obtain a first style face sample image, so as to combine the multiple original face sample images and the obtained multiple first style face samples The images are used for training to obtain the initial style image generation model and the trained style image real-time generation model.
  16. 根据权利要求15所述的方法,其特征在于,所述脸部调整包括脸型调整和/或嘴部调整。The method according to claim 15, wherein the face adjustment comprises face shape adjustment and/or mouth adjustment.
  17. 根据权利要求16所述的方法,其特征在于,对所述目标风格人脸样本图像上的人脸区域进行脸型调整,包括:The method according to claim 16, characterized in that, performing face shape adjustment on the face region on the target-style face sample image, comprising:
    确定所述目标风格人脸样本图像上人脸区域的初始脸部轮廓关键点,以及与所述初始脸部轮廓关键点对应的目标脸部轮廓关键点;其中,所述目标脸部轮廓关键点根据脸型调整需求确定;Determine the initial face contour key points of the face area on the target-style face sample image, and the target face contour key points corresponding to the initial face contour key points; wherein, the target face contour key points Determined according to face adjustment needs;
    基于所述初始脸部轮廓关键点和所述目标脸部轮廓关键点,对所述目标风格人脸样本图像上人脸区域的脸部轮廓进行调整,以得到所述第一风格人脸样本图像。Based on the initial face contour key points and the target face contour key points, adjust the face contour of the face region on the target style face sample image to obtain the first style face sample image .
  18. 根据权利要求17所述的方法,其特征在于,所述基于所述初始脸部轮廓关键点和所述目标脸部轮廓关键点,对所述目标风格人脸样本图像上人脸区域的脸部轮廓进行调整,包括:The method according to claim 17, wherein, based on the initial face contour key points and the target face contour key points, the face area of the face region on the target-style face sample image is analyzed. Profile adjustments, including:
    将所述初始脸部轮廓关键点移动至所述目标脸部轮廓关键点,并利用薄板样条插值函数对所述目标风格人脸样本图像上的人脸区域进行形变处理;moving the initial face contour key point to the target face contour key point, and using a thin plate spline interpolation function to deform the face area on the target style face sample image;
    利用所述目标风格人脸样本图像的人脸纹理渲染所述形变处理后的人脸区域,以得到所述第一风格人脸样本图像。The deformed face region is rendered using the face texture of the target style face sample image to obtain the first style face sample image.
  19. 根据权利要求16所述的方法,其特征在于,对所述目标风格人脸样本图像上的人脸区域进行嘴部调整,包括:The method according to claim 16, wherein, performing mouth adjustment on the face region on the target-style face sample image, comprising:
    确定所述目标风格人脸样本图像上人脸区域的嘴部关键点;determining the key points of the mouth in the face region on the target-style face sample image;
    将基于所述嘴部关键点确定的嘴部从所述目标风格人脸样本图像的人脸区域中移除,得到残缺风格人脸样本图像;removing the mouth determined based on the key points of the mouth from the face region of the target style face sample image to obtain the incomplete style face sample image;
    将预先生成的嘴部素材与所述残缺风格人脸样本图像进行融合,以得到所述第一风格人脸样本图像。The pre-generated mouth material is fused with the incomplete-style face sample image to obtain the first-style face sample image.
  20. 根据权利要求19所述的方法,其特征在于,所述将基于所述嘴部关键点确定的嘴部从所述目标风格人脸样本图像的人脸区域中移除,得到残缺风格人脸样本图像,包括:The method according to claim 19, wherein the mouth determined based on the key points of the mouth is removed from the face region of the target style face sample image to obtain the incomplete style face sample images, including:
    基于所述嘴部关键点,在所述目标风格人脸样本图像的人脸区域中确定包围所述嘴部的子区域;determining a sub-region surrounding the mouth in the face region of the target-style face sample image based on the mouth key points;
    利用固定边界求解算法,确定所述子区域中的嘴部边界线;Using a fixed boundary solving algorithm, determine the mouth boundary line in the sub-region;
    基于所述嘴部边界线,将所述嘴部从所述目标风格人脸样本图像的人脸区域中移除,得到所述残缺风格人脸样本图像。Based on the mouth boundary line, the mouth is removed from the face region of the target-style face sample image to obtain the incomplete-style face sample image.
  21. 根据权利要求19所述的方法,其特征在于,所述将预先生成的嘴部素材与所述残缺风格人脸样本图像进行融合,包括:The method according to claim 19, wherein the fusion of the pre-generated mouth material and the incomplete-style face sample image comprises:
    将所述嘴部素材上标注的关键点与所述目标风格人脸样本图像的人脸区域中嘴部关键点进行对齐处理,并基于薄板样条插值函数对所述嘴部素材进行形变处理;Aligning the key points marked on the mouth material with the key points of the mouth in the face region of the target-style face sample image, and deforming the mouth material based on a thin-plate spline interpolation function;
    利用所述目标风格人脸样本图像的嘴部纹理渲染所述形变处理后的嘴部素材。The deformed mouth material is rendered using the mouth texture of the target-style face sample image.
  22. 根据权利要求21所述的方法,其特征在于,所述将预先生成的嘴部素材与所述残缺风格人脸样本图像进行融合,还包括:The method according to claim 21, wherein the fusion of the pre-generated mouth material and the incomplete-style face sample image further comprises:
    确定所述嘴部素材上嘴部轮廓的内边界线和外边界线;determining the inner boundary line and the outer boundary line of the mouth contour on the mouth material;
    所述基于薄板样条插值函数对所述嘴部素材进行形变处理,包括:The deformation processing of the mouth material based on the thin plate spline interpolation function includes:
    利用所述薄板样条插值函数对所述嘴部素材内边界线以内的区域进行形变处理;Perform deformation processing on the region within the inner boundary line of the mouth material by using the thin plate spline interpolation function;
    利用求解最优化算法对所述内边界线和所述外边界线之间的区域进行形变处理。The region between the inner boundary line and the outer boundary line is deformed by using a solution optimization algorithm.
  23. 根据权利要求7至22任一项所述的方法,其特征在于,在获取所述多个原始人脸样本图像之后,还包括:The method according to any one of claims 7 to 22, wherein after acquiring the plurality of original face sample images, the method further comprises:
    针对每个原始人脸图像样本,执行如下操作:For each original face image sample, do the following:
    识别所述原始人脸样本图像的人脸区域,并确定人脸区域包围框的参数信息,以及所述人脸区域的旋转角度;Identify the face area of the original face sample image, and determine the parameter information of the face area bounding box, and the rotation angle of the face area;
    基于所述人脸区域包围框的参数信息,以及所述人脸区域的旋转角度对所述人脸区域进行位置调整,获得第一人脸样本图像;Adjust the position of the face region based on the parameter information of the bounding box of the face region and the rotation angle of the face region to obtain a first face sample image;
    基于多个第一人脸样本图像,利用预先训练的目标图像模型,获取与每个原始人脸样本图像对应的目标风格人脸样本图像。Based on a plurality of first face sample images, a target-style face sample image corresponding to each original face sample image is obtained by using a pre-trained target image model.
  24. 一种风格图像生成装置,其特征在于,包括:A style image generating device, comprising:
    原始人脸图像获取模块,用于获取原始人脸图像;The original face image acquisition module is used to obtain the original face image;
    目标风格人脸图像生成模块,用于利用预先训练的目标风格图像实时生成模型,得到与所述原始人脸图像对应的目标风格人脸图像;a target-style face image generation module, used to generate a model in real time by using the pre-trained target-style image to obtain a target-style face image corresponding to the original face image;
    其中,所述目标风格图像实时生成模型是在训练得到初始风格图像生成模型后,对基于所述初始风格图像生成模型按照至少一组裁剪参数进行至少一次裁剪操作得到的至少一个风格图像实时生成模型训练得到,并且所述初始风格图像生成模型和所述目标风格图像实时生成模型均基于多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像训练得到,其中,所述风格图像实时生成模型随着所述裁剪参数的变化而变化。The target style image real-time generation model is a real-time generation model of at least one style image obtained by at least one cropping operation based on the initial style image generation model according to at least one set of cropping parameters after the initial style image generation model is obtained by training. obtained by training, and the initial style image generation model and the target style image real-time generation model are both obtained by training based on multiple original face sample images and target style face sample images corresponding to each original face sample image, wherein , the real-time generation model of the style image changes with the change of the cropping parameter.
  25. 一种风格图像生成模型的训练装置,其特征在于,包括:A training device for a style image generation model, comprising:
    样本获取模块,用于获取多个原始人脸样本图像和与每个原始人脸样本图像对应的目标风格人脸样本图像;a sample acquisition module, used for acquiring a plurality of original face sample images and a target style face sample image corresponding to each original face sample image;
    第一训练模块,用于基于所述多个原始人脸样本图像和所述与每个原始人脸样本图像对应的目标风格人脸样本图像,训练得到初始风格图像生成模型;a first training module, used for training to obtain an initial style image generation model based on the plurality of original face sample images and the target style face sample images corresponding to each original face sample image;
    模型裁剪模块,用于基于所述初始风格图像生成模型按照至少一组裁剪参数进行至少一次裁剪操作,以得到至少一个风格图像实时生成模型;a model cropping module, configured to perform at least one cropping operation according to at least one set of cropping parameters based on the initial style image generation model to obtain at least one style image generation model in real time;
    第二训练模块,用于基于所述多个原始人脸样本图像和所述与每个原始人脸样本图像对应的目标风格人脸样本图像,对所述至少一个风格图像实时生成模型进行训练,得到训练后的目标风格图像实时生成模型,其中,所述风格图像实时生成模型随着所述裁剪参数的变化而变化。a second training module, configured to train the at least one style image real-time generation model based on the plurality of original face sample images and the target style face sample image corresponding to each original face sample image, A trained target style image real-time generation model is obtained, wherein the style image real-time generation model changes with the change of the cropping parameter.
  26. 一种电子设备,其特征在于,包括:An electronic device, comprising:
    处理装置;processing device;
    存储器,用于存储所述处理装置可执行指令;a memory for storing said processing device executable instructions;
    所述处理装置,用于从所述存储器中读取所述可执行指令,并执行所述可执行指令以 实现权利要求1-6中任一所述的风格图像生成方法,或者实现权利要求7-23中任一所述的风格图像生成模型的训练方法。The processing device, configured to read the executable instructions from the memory, and execute the executable instructions to implement the style image generation method according to any one of claims 1-6, or to implement claim 7 - The training method of the style image generation model described in any one of 23.
  27. 一种计算机可读存储介质,其特征在于,所述存储介质存储有计算机程序,所述计算机程序被处理装置执行时实现权利要求1-6中任一所述的风格图像生成方法,或者实现权利要求7-23中任一所述的风格图像生成模型的训练方法。A computer-readable storage medium, characterized in that, the storage medium stores a computer program, and when the computer program is executed by a processing device, the style image generation method described in any one of claims 1-6 is realized, or the right The training method of the style image generation model described in any one of requirements 7-23.
PCT/CN2021/113225 2020-09-30 2021-08-18 Style image generation method and apparatus, model training method and apparatus, device, and medium WO2022068451A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011066405.7A CN112991358A (en) 2020-09-30 2020-09-30 Method for generating style image, method, device, equipment and medium for training model
CN202011066405.7 2020-09-30

Publications (1)

Publication Number Publication Date
WO2022068451A1 true WO2022068451A1 (en) 2022-04-07

Family

ID=76344350

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/113225 WO2022068451A1 (en) 2020-09-30 2021-08-18 Style image generation method and apparatus, model training method and apparatus, device, and medium

Country Status (2)

Country Link
CN (1) CN112991358A (en)
WO (1) WO2022068451A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114897672A (en) * 2022-05-31 2022-08-12 北京外国语大学 Image cartoon style migration method based on equal deformation constraint
CN116862757A (en) * 2023-05-19 2023-10-10 上海任意门科技有限公司 Method, device, electronic equipment and medium for controlling face stylization degree

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112991358A (en) * 2020-09-30 2021-06-18 北京字节跳动网络技术有限公司 Method for generating style image, method, device, equipment and medium for training model
CN113222993A (en) * 2021-06-25 2021-08-06 北京市商汤科技开发有限公司 Image processing method, device, equipment and storage medium
CN113961746B (en) * 2021-09-29 2023-11-21 北京百度网讯科技有限公司 Video generation method, device, electronic equipment and readable storage medium
CN114004905B (en) * 2021-10-25 2024-03-29 北京字节跳动网络技术有限公司 Method, device, equipment and storage medium for generating character style pictogram
CN114429664A (en) * 2022-01-29 2022-05-03 脸萌有限公司 Video generation method and training method of video generation model
CN115082299B (en) * 2022-07-21 2022-11-25 中国科学院自动化研究所 Method, system and equipment for converting different source images of small samples in non-strict alignment

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325988A (en) * 2017-07-31 2019-02-12 腾讯科技(深圳)有限公司 A kind of facial expression synthetic method, device and electronic equipment
CN110414378A (en) * 2019-07-10 2019-11-05 南京信息工程大学 A kind of face identification method based on heterogeneous facial image fusion feature
US20190370936A1 (en) * 2018-06-04 2019-12-05 Adobe Inc. High Resolution Style Transfer
CN111062382A (en) * 2019-10-30 2020-04-24 北京交通大学 Channel pruning method for target detection network
CN111243050A (en) * 2020-01-08 2020-06-05 浙江省北大信息技术高等研究院 Portrait simple stroke generation method and system and drawing robot
CN111563455A (en) * 2020-05-08 2020-08-21 南昌工程学院 Damage identification method based on time series signal and compressed convolution neural network
CN112991358A (en) * 2020-09-30 2021-06-18 北京字节跳动网络技术有限公司 Method for generating style image, method, device, equipment and medium for training model

Family Cites Families (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1462982A (en) * 2002-05-29 2003-12-24 明日工作室股份有限公司 Intelligent method and system for creating motion pictures
CN100585775C (en) * 2007-01-30 2010-01-27 南京理工大学 Judgement method for surface cleanness after GaAs photoelectric cathode anneal
CN101034481A (en) * 2007-04-06 2007-09-12 湖北莲花山计算机视觉和信息科学研究院 Method for automatically generating portrait painting
CN104657974A (en) * 2013-11-25 2015-05-27 腾讯科技(上海)有限公司 Image processing method and device
US10796480B2 (en) * 2015-08-14 2020-10-06 Metail Limited Methods of generating personalized 3D head models or 3D body models
CN106934822B (en) * 2017-03-13 2019-09-13 浙江优迈德智能装备有限公司 Automobile workpiece non-rigid three-dimensional point cloud method for registering based on linear hybrid deformation
CN108229278B (en) * 2017-04-14 2020-11-17 深圳市商汤科技有限公司 Face image processing method and device and electronic equipment
CN107105315A (en) * 2017-05-11 2017-08-29 广州华多网络科技有限公司 Live broadcasting method, the live broadcasting method of main broadcaster's client, main broadcaster's client and equipment
CN107832844A (en) * 2017-10-30 2018-03-23 上海寒武纪信息科技有限公司 A kind of information processing method and Related product
CN108171789B (en) * 2017-12-21 2022-01-18 迈吉客科技(北京)有限公司 Virtual image generation method and system
CN108470320B (en) * 2018-02-24 2022-05-20 中山大学 Image stylization method and system based on CNN
CN108537725A (en) * 2018-04-10 2018-09-14 光锐恒宇(北京)科技有限公司 A kind of method for processing video frequency and device
CN109272111A (en) * 2018-08-15 2019-01-25 东南大学 A kind of neural network element implementation method based on chemical reaction network
CN109255831B (en) * 2018-09-21 2020-06-12 南京大学 Single-view face three-dimensional reconstruction and texture generation method based on multi-task learning
CN109410131B (en) * 2018-09-28 2020-08-04 杭州格像科技有限公司 Face beautifying method and system based on condition generation antagonistic neural network
CN109712080A (en) * 2018-10-12 2019-05-03 迈格威科技有限公司 Image processing method, image processing apparatus and storage medium
CN111488759A (en) * 2019-01-25 2020-08-04 北京字节跳动网络技术有限公司 Image processing method and device for animal face
CN109816098B (en) * 2019-01-25 2021-09-07 京东方科技集团股份有限公司 Processing method and evaluation method of neural network, and data analysis method and device
CN110072047B (en) * 2019-01-25 2020-10-09 北京字节跳动网络技术有限公司 Image deformation control method and device and hardware device
CN110070540B (en) * 2019-04-28 2023-01-10 腾讯科技(深圳)有限公司 Image generation method and device, computer equipment and storage medium
CN110826593B (en) * 2019-09-29 2021-02-05 腾讯科技(深圳)有限公司 Training method for fusion image processing model, image processing method and device
CN110930297B (en) * 2019-11-20 2023-08-18 咪咕动漫有限公司 Style migration method and device for face image, electronic equipment and storage medium
CN111145283A (en) * 2019-12-13 2020-05-12 北京智慧章鱼科技有限公司 Expression personalized generation method and device for input method
CN111222041A (en) * 2019-12-30 2020-06-02 北京达佳互联信息技术有限公司 Shooting resource data acquisition method and device, electronic equipment and storage medium
CN111160264B (en) * 2019-12-30 2023-05-12 中山大学 Cartoon character identity recognition method based on generation countermeasure network
CN111243051B (en) * 2020-01-08 2023-08-18 杭州未名信科科技有限公司 Portrait photo-based simple drawing generation method, system and storage medium
CN111429415B (en) * 2020-03-18 2020-12-08 东华大学 Method for constructing efficient detection model of product surface defects based on network collaborative pruning
CN111626113A (en) * 2020-04-20 2020-09-04 北京市西城区培智中心学校 Facial expression recognition method and device based on facial action unit
CN111626968B (en) * 2020-04-29 2022-08-26 杭州火烧云科技有限公司 Pixel enhancement design method based on global information and local information
CN111696028A (en) * 2020-05-22 2020-09-22 华南理工大学 Method and device for processing cartoon of real scene image, computer equipment and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109325988A (en) * 2017-07-31 2019-02-12 腾讯科技(深圳)有限公司 A kind of facial expression synthetic method, device and electronic equipment
US20190370936A1 (en) * 2018-06-04 2019-12-05 Adobe Inc. High Resolution Style Transfer
CN110414378A (en) * 2019-07-10 2019-11-05 南京信息工程大学 A kind of face identification method based on heterogeneous facial image fusion feature
CN111062382A (en) * 2019-10-30 2020-04-24 北京交通大学 Channel pruning method for target detection network
CN111243050A (en) * 2020-01-08 2020-06-05 浙江省北大信息技术高等研究院 Portrait simple stroke generation method and system and drawing robot
CN111563455A (en) * 2020-05-08 2020-08-21 南昌工程学院 Damage identification method based on time series signal and compressed convolution neural network
CN112991358A (en) * 2020-09-30 2021-06-18 北京字节跳动网络技术有限公司 Method for generating style image, method, device, equipment and medium for training model

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114897672A (en) * 2022-05-31 2022-08-12 北京外国语大学 Image cartoon style migration method based on equal deformation constraint
CN116862757A (en) * 2023-05-19 2023-10-10 上海任意门科技有限公司 Method, device, electronic equipment and medium for controlling face stylization degree

Also Published As

Publication number Publication date
CN112991358A (en) 2021-06-18

Similar Documents

Publication Publication Date Title
WO2022068451A1 (en) Style image generation method and apparatus, model training method and apparatus, device, and medium
US11410284B2 (en) Face beautification method and apparatus, computer device, and storage medium
WO2022068487A1 (en) Styled image generation method, model training method, apparatus, device, and medium
WO2022012085A1 (en) Face image processing method and apparatus, storage medium, and electronic device
WO2022042290A1 (en) Virtual model processing method and apparatus, electronic device and storage medium
CN111243049B (en) Face image processing method and device, readable medium and electronic equipment
WO2023093897A1 (en) Image processing method and apparatus, electronic device, and storage medium
CN112927362A (en) Map reconstruction method and device, computer readable medium and electronic device
WO2023284401A1 (en) Image beautification processing method and apparatus, storage medium, and electronic device
WO2023138560A1 (en) Stylized image generation method and apparatus, electronic device, and storage medium
CN111950570B (en) Target image extraction method, neural network training method and device
WO2022262474A1 (en) Zoom control method and apparatus, electronic device, and computer-readable storage medium
WO2022132032A1 (en) Portrait image processing method and device
CN115578515B (en) Training method of three-dimensional reconstruction model, three-dimensional scene rendering method and device
WO2023207379A1 (en) Image processing method and apparatus, device and storage medium
CN114596383A (en) Line special effect processing method and device, electronic equipment, storage medium and product
CN113902636A (en) Image deblurring method and device, computer readable medium and electronic equipment
WO2023193613A1 (en) Highlight shading method and apparatus, and medium and electronic device
CN115908679A (en) Texture mapping method, device, equipment and storage medium
US20230284768A1 (en) Beauty makeup special effect generation method, device, and storage medium
CN115082636B (en) Single image three-dimensional reconstruction method and device based on mixed Gaussian network
CN111667553A (en) Head-pixelized face color filling method and device and electronic equipment
WO2023040813A1 (en) Facial image processing method and apparatus, and device and medium
CN113240599A (en) Image toning method and device, computer-readable storage medium and electronic equipment
CN116137025A (en) Video image correction method and device, computer readable medium and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21874113

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12.07.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21874113

Country of ref document: EP

Kind code of ref document: A1