CN112132912A - Method and device for establishing face generation model and generating face image - Google Patents

Method and device for establishing face generation model and generating face image Download PDF

Info

Publication number
CN112132912A
CN112132912A CN201910556085.4A CN201910556085A CN112132912A CN 112132912 A CN112132912 A CN 112132912A CN 201910556085 A CN201910556085 A CN 201910556085A CN 112132912 A CN112132912 A CN 112132912A
Authority
CN
China
Prior art keywords
image
face
model
generation
images
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910556085.4A
Other languages
Chinese (zh)
Other versions
CN112132912B (en
Inventor
李鑫
刘霄
张赫男
赵翔
李甫
何栋梁
龙翔
周志超
孙昊
文石磊
丁二锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201910556085.4A priority Critical patent/CN112132912B/en
Publication of CN112132912A publication Critical patent/CN112132912A/en
Application granted granted Critical
Publication of CN112132912B publication Critical patent/CN112132912B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4038Image mosaicing, e.g. composing plane images from plane sub-images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Image Processing (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention provides a method for establishing a human face generation model, which comprises the following steps: acquiring a face image; extracting images of preset parts and face edge images from each face image, and splicing the extracted images to serve as spliced images corresponding to the face images, wherein the images of the preset parts are mouth images; constructing a generation countermeasure network comprising a generation model and a discrimination model; and training the generation countermeasure network according to the face image and the spliced image corresponding to the face image, and obtaining a face generation model by using a generation model in the generation countermeasure network obtained by training. The invention also provides a method for generating a face image, which comprises the following steps: acquiring a mouth image; extracting a face edge image of a face in a template image, and splicing the face edge image and the mouth image to obtain an input image; and inputting the input image into a face generation model, and obtaining a face image according to an output result of the face generation model. The invention can generate high-definition vivid human face images.

Description

Method and device for establishing face generation model and generating face image
[ technical field ] A method for producing a semiconductor device
The present invention relates to the field of image processing technologies, and in particular, to a method, an apparatus, a device, and a computer storage medium for creating a face generation model and generating a face image.
[ background of the invention ]
In the related art, the human face image is generally generated using a 2D technique or a 3D technique. However, the face image generated by the 2D technique is blurred, and the expression of the face image generated by the 3D technique is dull. Therefore, it is desirable to provide a method capable of generating a high-definition realistic human face image.
[ summary of the invention ]
In view of the above, the present invention provides a method, an apparatus, a device and a computer storage medium for creating a face generation model and generating a face image, which are used to generate a high-definition realistic face image.
The technical scheme adopted by the invention for solving the technical problem is to provide a method for establishing a human face generation model, which comprises the following steps: acquiring a face image; extracting images of preset parts and face edge images from each face image, and splicing the extracted images to serve as spliced images corresponding to the face images, wherein the images of the preset parts are mouth images; constructing a generation countermeasure network comprising a generation model and a discrimination model; and training the generation countermeasure network according to the face image and the spliced image corresponding to the face image, and obtaining a face generation model by using a generation model in the generation countermeasure network obtained by training.
According to a preferred embodiment of the present invention, after the face image is acquired, the method further includes: and acquiring the resolution of each face image, and filtering the face images with the resolution lower than a preset threshold value.
According to a preferred embodiment of the present invention, the image of the predetermined portion further includes an eye image and an eyebrow image; the face edge image is an image from which a mouth, a nose, and a chin of the face image are removed.
According to a preferred embodiment of the present invention, the constructing a generative countermeasure network including a generative model and a discriminant model includes: and combining N discriminators to form the discrimination model, wherein the input of each discriminator corresponds to image blocks with different scales respectively, and N is a positive integer greater than or equal to 2.
According to a preferred embodiment of the present invention, the training of the generation countermeasure network according to the face image and the corresponding stitched image comprises: taking the face image as a real sample; inputting the spliced image into the generation model, and taking an output result obtained by the generation model as a generation sample; taking the real sample and the corresponding generated sample as the input of the discriminant model, and obtaining the loss functions of the discriminant model and the generated model according to the output result of the discriminant model; and adjusting parameters in the network structures of the generative model and the discriminant model according to the discriminant model and the loss function of the generative model until the generation countermeasure network converges.
According to a preferred embodiment of the present invention, the using the real sample and the corresponding generated sample as the input of the discriminant model includes: acquiring N image blocks with different scales from the real sample; acquiring N image blocks with different scales from the same position of the generated sample; and taking the two image blocks with the same scale as the input of the discriminators with the corresponding scale, and splicing the output result of each scale discriminator as the output result of the discrimination model.
The technical scheme adopted by the invention for solving the technical problem is to provide a method for generating a face image, which comprises the following steps: acquiring a mouth image; extracting a face edge image of a face in a template image, and splicing the face edge image and the mouth image to obtain an input image; and inputting the input image into a face generation model, and obtaining a face image according to an output result of the face generation model.
According to a preferred embodiment of the present invention, the acquiring a mouth image includes: acquiring a text; the text is converted into speech, and a mouth image is generated based on the converted speech.
According to a preferred embodiment of the invention, the method further comprises: extracting an eye image and an eyebrow image of a human face in the template image; and splicing the mouth image, the eye image, the eyebrow image and the face edge image to obtain an input image.
The technical scheme adopted by the invention for solving the technical problem is to provide a device for establishing a human face generation model, and the device comprises: the first acquisition unit is used for acquiring a face image; the first splicing unit is used for extracting images of preset parts and human face edge images from the human face images and splicing the extracted images to serve as spliced images corresponding to the human face images, wherein the images of the preset parts are mouth images; the system comprises a construction unit, a judgment unit and a control unit, wherein the construction unit is used for constructing a generation countermeasure network comprising a generation model and a judgment model; and the training unit is used for training the generation countermeasure network according to the face image and the spliced image corresponding to the face image, and obtaining a face generation model by using a generation model in the generation countermeasure network obtained by training.
According to a preferred embodiment of the present invention, after the first obtaining unit obtains the face image, the first obtaining unit further performs: and acquiring the resolution of each face image, and filtering the face images with the resolution lower than a preset threshold value.
According to a preferred embodiment of the present invention, the image of the predetermined portion further includes an eye image and an eyebrow image; the face edge image is an image from which a mouth, a nose, and a chin of the face image are removed.
According to a preferred embodiment of the present invention, the constructing unit, when constructing the generative countermeasure network including the generative model and the discriminant model, specifically performs: and combining N discriminators to form the discrimination model, wherein the input of each discriminator corresponds to image blocks with different scales respectively, and N is a positive integer greater than or equal to 2.
According to a preferred embodiment of the present invention, when the training unit trains the generation countermeasure network according to the face image and the corresponding stitched image, the following steps are specifically performed: taking the face image as a real sample; inputting the spliced image into the generation model, and taking an output result obtained by the generation model as a generation sample; taking the real sample and the corresponding generated sample as the input of the discriminant model, and obtaining the loss functions of the discriminant model and the generated model according to the output result of the discriminant model; and adjusting parameters in the network structures of the generative model and the discriminant model according to the discriminant model and the loss function of the generative model until the generation countermeasure network converges.
According to a preferred embodiment of the present invention, the training unit specifically executes, when the real sample and the corresponding generated sample are used as the input of the discriminant model: acquiring N image blocks with different scales from the real sample; acquiring N image blocks with different scales from the same position of the generated sample; and taking the two image blocks with the same scale as the input of the discriminators with the corresponding scale, and splicing the output result of each scale discriminator as the output result of the discrimination model.
The technical solution adopted by the present invention to solve the technical problem is to provide a device for generating a face image, the device comprising: a second acquisition unit configured to acquire a mouth image; the second splicing unit is used for extracting a face edge image of a face in the template image and splicing the face edge image and the mouth image to obtain an input image; and the processing unit is used for inputting the input image into a human face generation model and obtaining a human face image according to an output result of the human face generation model.
According to a preferred embodiment of the present invention, the second acquiring unit, when acquiring the mouth image, specifically performs: acquiring a text; the text is converted into speech, and a mouth image is generated based on the converted speech.
According to a preferred embodiment of the present invention, the splicing unit is further configured to: extracting an eye image and an eyebrow image of a human face in the template image; and splicing the mouth image, the eye image, the eyebrow image and the face edge image to obtain an input image.
According to the technical scheme, the method and the device have the advantages that the images of the preset parts in the face images and the spliced images obtained by the face edge images are extracted to train the generation of the countermeasure network, so that the problem that the different preset parts in the face can influence the face images when speaking is fully considered, and the generation model in the generated countermeasure network obtained by training can generate a more high-definition and vivid face image.
[ description of the drawings ]
Fig. 1 is a flowchart of a method for creating a face generation model according to an embodiment of the present invention;
fig. 2 is a flowchart of a method for generating a face image according to an embodiment of the present invention;
FIG. 3 is a diagram of an apparatus for creating a face generation model according to an embodiment of the present invention;
fig. 4 is a structural diagram of an apparatus for generating a face image according to an embodiment of the present invention;
fig. 5 is a block diagram of a computer system/server according to an embodiment of the invention.
[ detailed description ] embodiments
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in detail with reference to the accompanying drawings and specific embodiments.
The terminology used in the embodiments of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in the examples of the present invention and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.
It should be understood that the term "and/or" as used herein is merely one type of association that describes an associated object, meaning that three relationships may exist, e.g., a and/or B may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" herein generally indicates that the former and latter related objects are in an "or" relationship.
The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.
Fig. 1 is a flowchart of a method for creating a face generation model according to an embodiment of the present invention, as shown in fig. 1, the method includes:
in 101, a face image is acquired.
In this step, a face image is acquired, and the acquired face image is used for training generation of a countermeasure network to establish a face generation model. The step can acquire a face image from the Internet in a web crawler mode; the face image can also be obtained from each frame of image extracted from a given video. The method for acquiring the face image is not limited.
In addition, in order to enable the established face generation model to generate a high-definition face image, after the face image is acquired, the following contents may be further included: acquiring the resolution of each face image; and filtering the face image with the resolution ratio lower than a preset threshold value. That is, the face image with lower resolution is discarded in this step, so that the face generation model is established by using the clearer face image.
At 102, images of preset portions and face edge images are extracted from the face images, and the extracted images are spliced to serve as spliced images corresponding to the face images, wherein the images of the preset portions are mouth images.
In this step, images of preset portions and face edge images are respectively extracted from the face images acquired in step 101, and the extracted images of the preset portions and the face edge images are stitched to serve as stitched images corresponding to the face images. In the step, the image of the preset part is a mouth image in the face image, and can also comprise an eye image and an eyebrow image in the face image.
Since the lower half of the face is different due to different mouth shapes when the user speaks, the face edge image in this step is an image with the mouth, nose, and chin removed from the face image.
In this step, a face key point detection technology may be used to extract an eye image, an eyebrow image, and a mouth image in the face image, and an edge detection technology may be used to obtain a face edge image from which a mouth, a nose, and a chin in the face image are removed. The image of the preset part and the face edge image are extracted from the face image by adopting the prior art, and the description is omitted here.
At 103, a generative confrontation network is constructed that includes a generative model and a discriminative model.
In the step, a generation countermeasure network comprising a generation model and a discrimination model is constructed, so that after the training of the constructed generation countermeasure network is completed, a face generation model for generating a high-definition face image is obtained based on the generation model in the generation countermeasure network obtained by training.
The role of the generative model in the generation countermeasure network constructed in this step is to generate a generative sample that is as similar as possible to the real sample, and the role of the discriminant model is to distinguish the real sample from the generative sample as much as possible. The method is characterized in that a generated countermeasure network is trained in a countermeasure game mode between a generated model and a discriminant model, so that the authenticity of a generated sample output by the generated model is as high as possible, and the discriminant model cannot distinguish whether the output obtained by the generated model is the generated sample or a real sample.
In general, only one discriminator is included in the discrimination model for generating the countermeasure network, and therefore the discrimination model of the related art cannot take into account both the texture detail and the overall quality of the input image. Therefore, in order to enable the discrimination model to take account of the texture details and the overall quality of the face image, the discrimination model in the generation countermeasure network constructed in the step is formed by combining N discriminators, the input of each discriminator respectively corresponds to image blocks with different scales in the face image, and N is a positive integer greater than or equal to 2. The discriminators corresponding to the small scales in the discrimination model pay more attention to the texture details of the image, and the discriminators corresponding to the large scales in the discrimination model pay more attention to the overall quality of the image.
For example, if the discrimination model constructed in this step includes 3 discriminators, namely, discriminator 1, discriminator 2 and discriminator 3, respectively, where the input of the discriminator 1 may be an image block with a size of 32 × 32 pixels in the face image, the input of the discriminator 2 may be an image block with a size of 64 × 64 pixels in the face image, and the input of the discriminator 3 may be an image block with a size of 128 × 128 pixels in the face image.
At 104, the generation countermeasure network is trained according to the face image and the mosaic image corresponding to the face image, and a face generation model is obtained by using a generation model in the generation countermeasure network obtained by training.
And training the generated countermeasure network consisting of the generated model and the discriminant model in an alternating training mode, considering that the training of the generated countermeasure network is finished when the whole generated countermeasure network is converged, and further taking the generated model in the generated countermeasure network obtained by training as a face generation model, wherein data can be input through the face generation model to obtain a corresponding high-definition face image.
Specifically, when the countermeasure network is generated according to the face image and the stitched image corresponding to the face image, the following method may be adopted: taking the obtained face image as a real sample; inputting the obtained spliced image into a generation model, and taking an output result obtained by the generation model as a generation sample; inputting the real sample and the corresponding generated sample into a discrimination model, and obtaining a loss function of the discrimination model and the generated model according to an output result of the discrimination model; and adjusting parameters in the network structures of the generated model and the discriminant model according to the discriminant model and the loss function of the generated model until the generation of the confrontation network converges.
It can be understood that if the constructed discriminant model includes N discriminants, the following method may be adopted in this step when inputting the real sample and the corresponding generated sample into the discriminant model: acquiring N image blocks with different scales from a real sample; acquiring N image blocks with different scales from the same position of the generated sample, for example, acquiring a 32 × 32 image block from the upper left corner of the real sample, and acquiring a 32 × 32 image block from the upper left corner of the generated sample; and taking the two image blocks with the same scale as the input of the discriminators with the corresponding scale, and splicing the output result of each scale discriminator as the output result of the discrimination model.
Wherein, the generation of the confrontation network convergence in the step is the minimization of the loss function of the generation model and the discriminant model. Optionally, in a specific implementation process of this embodiment, if the obtained loss functions within the preset number of times are equal, the loss function is considered to be minimized; the loss function may also be considered to be minimized if a difference between the loss functions obtained within the preset number of times is less than or equal to a preset threshold; it is also possible to consider the loss function to be minimized if the number of training passes a preset number.
When the loss function of the generative model and the loss function of the discriminant model are minimized, that is, the generative confrontation network converges, the training of the generative confrontation network is considered to be completed, and the generative model in the generated confrontation network after training is used as the face generative model.
Fig. 2 is a flowchart of a method for generating a face image according to an embodiment of the present invention, as shown in fig. 2, the method includes:
in 201, a mouth image is acquired.
In this step, a mouth image is acquired, and the acquired mouth image is used as an input of a face generation model to obtain a face image.
Specifically, this step may acquire a mouth image in the following manner: acquiring a text, wherein the acquired text can be a single Chinese character or a single letter, and different characters correspond to different mouth shapes; the acquired text is converted into speech, and a mouth image is generated based on the speech obtained by the conversion. In this step, the mouth image may also be obtained from a preset image sequence, and the image in the preset image sequence may be the mouth image directly or an image including the mouth image.
In 202, a face edge image of a face in a template image is extracted, and the face edge image and the mouth image are spliced to obtain an input image.
In this step, a face edge image of a face in the template image is extracted, the extracted face edge image and the mouth image obtained in step 201 are stitched, and the stitched result is used as an input image.
It can be understood that, when the face edge image is extracted from the template image, the eye image and the eyebrow image of the face in the template image can be extracted, the extracted eye image, eyebrow image, face edge image and mouth image are spliced, and the spliced result is used as an input image.
In 203, the input image is input into a face generation model obtained by pre-training, and a face image is obtained according to an output result of the face generation model.
In this step, the input image obtained in step 202 is used as the input of the face generation model obtained by pre-training, and the face image is obtained according to the output result of the face generation model.
It is understood that if a plurality of mouth images are acquired in step 201, the following may be included after acquiring a plurality of face images in this step: combining the acquired face images according to a preset sequence to obtain a face image sequence, for example, according to the sequence of each image in the preset image sequence or according to the character sequence of an input text; acquiring voices corresponding to the mouth images to obtain a voice sequence; and synchronously superposing the voice sequence and the face image sequence to obtain virtual video data. That is to say, in this step, after the high-definition face image is acquired, the virtual video data with a high-definition visual effect can be further acquired.
Fig. 3 is a structural diagram of an apparatus for creating a face generation model according to an embodiment of the present invention, as shown in fig. 3, the apparatus includes: a first acquisition unit 31, a first stitching unit 32, a construction unit 33 and a training unit 34.
A first acquiring unit 31 for acquiring a face image.
The first acquisition unit 31 acquires a face image, and the acquired face image is used for training generation of a countermeasure network to build a face generation model. The first acquiring unit 31 may acquire a face image from the internet in a web crawler manner; the face image can also be obtained from each frame of image extracted from a given video. The method for acquiring the face image is not limited.
In addition, in order to enable the created face generation model to generate a high-definition face image, the first acquiring unit 31 may further include the following after acquiring the face image: acquiring the resolution of each face image; and filtering the face image with the resolution ratio lower than a preset threshold value. That is, the first acquisition unit 31 discards a face image with a lower resolution, thereby determining to build a face generation model using a clearer face image.
The first stitching unit 32 is configured to extract an image of a preset portion and a face edge image from each face image, and stitch the extracted images as stitched images corresponding to each face image, where the image of the preset portion is a mouth image.
The first stitching unit 32 extracts an image of a preset portion and a face edge image from each face image acquired by the first acquiring unit 31, and stitches the extracted image of the preset portion and the face edge image as a stitched image corresponding to each face image. The image of the preset portion extracted by the first stitching unit 32 is a mouth image in the face image, and may further include an eye image and an eyebrow image in the face image.
Since the lower half of the face is different due to different mouth shapes when the user speaks, the face edge image in the first stitching unit 32 is an image from which the mouth, nose, and chin of the face image are removed.
The first stitching unit 32 may extract an eye image, an eyebrow image, and a mouth image in the face image by using a face key point detection technique, and may obtain a face edge image with the mouth, nose, and chin removed from the face image by using an edge detection technique.
A construction unit 33, configured to construct a generative confrontation network including a generative model and a discriminant model.
The construction unit 33 constructs a generative confrontation network including the generative model and the discriminant model, so that after the training of the constructed generative confrontation network is completed, a face generation model for generating a high-definition face image is obtained based on the generative model in the generated confrontation network obtained by the training.
It is the responsibility of the generative model in the antagonistic network constructed by the construction unit 33 to generate generative samples that are as similar as possible to the true samples, while it is the responsibility of the discriminant model to distinguish the true samples from the generative samples as much as possible. The method is characterized in that a generated countermeasure network is trained in a countermeasure game mode between a generated model and a discriminant model, so that the authenticity of a generated sample output by the generated model is as high as possible, and the discriminant model cannot distinguish whether the output obtained by the generated model is the generated sample or a real sample.
In general, only one discriminator is included in the discrimination model for generating the countermeasure network, and therefore the discrimination model of the related art cannot take into account both the texture detail and the overall quality of the input image. Therefore, in order to make the discrimination model take into account the texture details and the overall quality of the face image, the discrimination model in the generation countermeasure network constructed by the construction unit 33 is composed of N discriminators, the input of each discriminator corresponds to an image block with different dimensions in the face image, and N is a positive integer greater than or equal to 2. The discriminators corresponding to the small scales in the discrimination model pay more attention to the texture details of the image, and the discriminators corresponding to the large scales in the discrimination model pay more attention to the overall quality of the image.
And the training unit 34 is configured to train the generation countermeasure network according to the face image and the stitched image corresponding to the face image, and obtain a face generation model by using a generation model in the generation countermeasure network obtained through training.
And training the generated countermeasure network consisting of the generated model and the discriminant model in an alternating training mode, considering that the training of the generated countermeasure network is finished when the whole generated countermeasure network is converged, and further taking the generated model in the generated countermeasure network obtained by training as a face generation model, wherein data can be input through the face generation model to obtain a corresponding high-definition face image.
Specifically, when the training unit 34 trains and generates the countermeasure network according to the face image and the stitched image corresponding to the face image, the following method may be adopted: taking the obtained face image as a real sample; inputting the obtained spliced image into a generation model, and taking an output result obtained by the generation model as a generation sample; inputting the real sample and the corresponding generated sample into a discrimination model, and obtaining a loss function of the discrimination model and the generated model according to an output result of the discrimination model; and adjusting parameters in the network structures of the generated model and the discriminant model according to the discriminant model and the loss function of the generated model until the generation of the confrontation network converges.
It can be understood that if the constructed discriminant model includes N discriminants, the training unit 34 may adopt the following method when inputting the real sample and the corresponding generated sample into the discriminant model: acquiring N image blocks with different scales from a real sample; acquiring N image blocks with different scales from the same position of a generated sample; and taking the two image blocks with the same scale as the input of the discriminators with the corresponding scale, and splicing the output result of each scale discriminator as the output result of the discrimination model.
The convergence of the countermeasure network generated in the training unit 34 is the minimization of the loss function of the generated model and the discriminant model. Optionally, in a specific implementation process of this embodiment, if the obtained loss functions within the preset number of times are equal, the loss function is considered to be minimized; the loss function may also be considered to be minimized if a difference between the loss functions obtained within the preset number of times is less than or equal to a preset threshold; it is also possible to consider the loss function to be minimized if the number of training passes a preset number.
When the loss function of the generative model and the loss function of the discriminant model are minimized, that is, the generative confrontation network converges, the training of the generative confrontation network is considered to be completed, and the generative model in the generated confrontation network after training is used as the face generative model.
Fig. 4 is a block diagram of an apparatus for generating a face image according to an embodiment of the present invention, as shown in fig. 4, the apparatus includes: a second acquisition unit 41, a second stitching unit 42 and a processing unit 43.
A second acquisition unit 41 for acquiring a mouth image.
The second acquisition unit 41 acquires a mouth image, which is used as an input of the face generation model to obtain a face image.
Specifically, the second acquisition unit 41 may acquire the mouth image in the following manner: acquiring a text, wherein the acquired text can be a single Chinese character or a single letter, and different characters correspond to different mouth shapes; the acquired text is converted into speech, and a mouth image is generated based on the speech obtained by the conversion. The second obtaining unit 41 may also obtain a mouth image from a preset image sequence, and the images in the preset image sequence may be the mouth image directly or images including the mouth image.
And the second splicing unit 42 is configured to extract a face edge image of a face in the template image, and splice the face edge image and the mouth image to obtain an input image.
The second stitching unit 42 extracts a face edge image of the face in the template image, and stitches the extracted face edge image with the mouth image acquired by the second acquisition unit 41, taking a stitching result as an input image.
It is understood that, when the second stitching unit 42 extracts the face edge image from the template image, it may further extract an eye image and an eyebrow image of the face in the template image, and stitch the extracted eye image, eyebrow image, face edge image, and mouth image, and use the stitching result as the input image.
And the processing unit 43 is configured to input the input image into a face generation model obtained through pre-training, and obtain a face image according to an output result of the face generation model.
In this step, the input image obtained by the second stitching unit 42 is used as the input of the face generation model obtained by pre-training, and the face image is obtained according to the output result of the face generation model.
It is understood that, if the first acquiring unit 41 acquires a plurality of mouth images, the processing unit 43 may further include the following after acquiring a plurality of face images: combining the acquired face images according to a preset sequence to obtain a face image sequence, for example, according to the sequence of each image in the preset image sequence or according to the character sequence of an input text; acquiring voices corresponding to the mouth images to obtain a voice sequence; and synchronously superposing the voice sequence and the face image sequence to obtain virtual video data. That is, the processing unit 43 can further acquire virtual video data having a high-definition visual effect after acquiring a high-definition face image.
As shown in fig. 5, the computer system/server 012 is embodied as a general purpose computing device. The components of computer system/server 012 may include, but are not limited to: one or more processors or processing units 016, a system memory 028, and a bus 018 that couples various system components including the system memory 028 and the processing unit 016.
Bus 018 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, or a local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, micro-channel architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Computer system/server 012 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by computer system/server 012 and includes both volatile and nonvolatile media, removable and non-removable media.
System memory 028 can include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM)030 and/or cache memory 032. The computer system/server 012 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 034 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 5, commonly referred to as a "hard drive"). Although not shown in FIG. 5, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be connected to bus 018 via one or more data media interfaces. Memory 028 can include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of embodiments of the present invention.
Program/utility 040 having a set (at least one) of program modules 042 can be stored, for example, in memory 028, such program modules 042 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof might include an implementation of a network environment. Program modules 042 generally perform the functions and/or methodologies of embodiments of the present invention as described herein.
The computer system/server 012 may also communicate with one or more external devices 014 (e.g., keyboard, pointing device, display 024, etc.), hi the present invention, the computer system/server 012 communicates with an external radar device, and may also communicate with one or more devices that enable a user to interact with the computer system/server 012, and/or with any device (e.g., network card, modem, etc.) that enables the computer system/server 012 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 022. Also, the computer system/server 012 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet) via the network adapter 020. As shown, the network adapter 020 communicates with the other modules of the computer system/server 012 via bus 018. It should be appreciated that, although not shown, other hardware and/or software modules may be used in conjunction with the computer system/server 012, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
The processing unit 016 executes programs stored in the system memory 028, thereby executing various functional applications and data processing, such as implementing the method flow provided by the embodiment of the present invention.
With the development of time and technology, the meaning of media is more and more extensive, and the propagation path of computer programs is not limited to tangible media any more, and can also be downloaded from a network directly and the like. Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
By utilizing the technical scheme provided by the invention, the image of the preset part in the face image and the spliced image obtained by the face edge image are extracted to train the generation countermeasure network, so that the problem that the difference of the preset part in the face can influence the face image when speaking is fully considered, and the generation model in the generation countermeasure network obtained by training can generate a more high-definition vivid face image.
In the embodiments provided in the present invention, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the units is only one logical functional division, and other divisions may be realized in practice.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.
The integrated unit implemented in the form of a software functional unit may be stored in a computer readable storage medium. The software functional unit is stored in a storage medium and includes several instructions to enable a computer device (which may be a personal computer, a server, or a network device) or a processor (processor) to execute some steps of the methods according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (20)

1. A method of creating a face generation model, the method comprising:
acquiring a face image;
extracting images of preset parts and face edge images from each face image, and splicing the extracted images to serve as spliced images corresponding to the face images, wherein the images of the preset parts are mouth images;
constructing a generation countermeasure network comprising a generation model and a discrimination model;
and training the generation countermeasure network according to the face image and the spliced image corresponding to the face image, and obtaining a face generation model by using a generation model in the generation countermeasure network obtained by training.
2. The method of claim 1, after obtaining the face image, further comprising:
and acquiring the resolution of each face image, and filtering the face images with the resolution lower than a preset threshold value.
3. The method according to claim 1, wherein the images of the predetermined locations further include an eye image and an eyebrow image;
the face edge image is an image from which a mouth, a nose, and a chin of the face image are removed.
4. The method of claim 1, wherein constructing a generative countermeasure network comprising a generative model and a discriminant model comprises:
and combining N discriminators to form the discrimination model, wherein the input of each discriminator corresponds to image blocks with different scales respectively, and N is a positive integer greater than or equal to 2.
5. The method of claim 4, wherein training the generator countermeasure network from the face image and its corresponding stitched image comprises:
taking the face image as a real sample;
inputting the spliced image into the generation model, and taking an output result obtained by the generation model as a generation sample;
taking the real sample and the corresponding generated sample as the input of the discriminant model, and obtaining the loss functions of the discriminant model and the generated model according to the output result of the discriminant model;
and adjusting parameters in the network structures of the generative model and the discriminant model according to the discriminant model and the loss function of the generative model until the generation countermeasure network converges.
6. The method of claim 5, wherein the taking the real samples and their corresponding generated samples as input to the discriminant model comprises:
acquiring N image blocks with different scales from the real sample;
acquiring N image blocks with different scales from the same position of the generated sample;
and taking the two image blocks with the same scale as the input of the discriminators with the corresponding scale, and splicing the output result of each scale discriminator as the output result of the discrimination model.
7. A method of generating an image of a human face, the method comprising:
acquiring a mouth image;
extracting a face edge image of a face in a template image, and splicing the face edge image and the mouth image to obtain an input image;
inputting the input image into a face generation model, and obtaining a face image according to an output result of the face generation model;
the face generation model is pre-built according to any of claims 1 to 6.
8. The method of claim 7, wherein the acquiring a mouth image comprises:
acquiring a text;
the text is converted into speech, and a mouth image is generated based on the converted speech.
9. The method of claim 7, further comprising:
extracting an eye image and an eyebrow image of a human face in the template image;
and splicing the mouth image, the eye image, the eyebrow image and the face edge image to obtain an input image.
10. An apparatus for modeling face generation, the apparatus comprising:
the first acquisition unit is used for acquiring a face image;
the first splicing unit is used for extracting images of preset parts and human face edge images from the human face images and splicing the extracted images to serve as spliced images corresponding to the human face images, wherein the images of the preset parts are mouth images;
the system comprises a construction unit, a judgment unit and a control unit, wherein the construction unit is used for constructing a generation countermeasure network comprising a generation model and a judgment model;
and the training unit is used for training the generation countermeasure network according to the face image and the spliced image corresponding to the face image, and obtaining a face generation model by using a generation model in the generation countermeasure network obtained by training.
11. The apparatus according to claim 10, wherein the first acquiring unit further performs, after acquiring the face image:
and acquiring the resolution of each face image, and filtering the face images with the resolution lower than a preset threshold value.
12. The apparatus according to claim 10, wherein the images of the predetermined locations further include an eye image and an eyebrow image;
the face edge image is an image from which a mouth, a nose, and a chin of the face image are removed.
13. The apparatus according to claim 10, wherein the constructing unit, when constructing the generative countermeasure network including the generative model and the discriminant model, specifically performs:
and combining N discriminators to form the discrimination model, wherein the input of each discriminator corresponds to image blocks with different scales respectively, and N is a positive integer greater than or equal to 2.
14. The apparatus according to claim 13, wherein the training unit, when training the generation countermeasure network according to the face image and the corresponding stitched image, specifically performs:
taking the face image as a real sample;
inputting the spliced image into the generation model, and taking an output result obtained by the generation model as a generation sample;
taking the real sample and the corresponding generated sample as the input of the discriminant model, and obtaining the loss functions of the discriminant model and the generated model according to the output result of the discriminant model;
and adjusting parameters in the network structures of the generative model and the discriminant model according to the discriminant model and the loss function of the generative model until the generation countermeasure network converges.
15. The apparatus according to claim 14, wherein the training unit specifically executes, as the input of the discriminant model, the real samples and the corresponding generated samples:
acquiring N image blocks with different scales from the real sample;
acquiring N image blocks with different scales from the same position of the generated sample;
and taking the two image blocks with the same scale as the input of the discriminators with the corresponding scale, and splicing the output result of each scale discriminator as the output result of the discrimination model.
16. An apparatus for generating a face image, the apparatus comprising:
a second acquisition unit configured to acquire a mouth image;
the second splicing unit is used for extracting a face edge image of a face in the template image and splicing the face edge image and the mouth image to obtain an input image;
the processing unit is used for inputting the input image into a human face generation model and obtaining a human face image according to an output result of the human face generation model;
the face generation model is pre-built according to any of claims 10 to 15.
17. The apparatus according to claim 16, wherein the second acquisition unit, when acquiring the mouth image, specifically performs:
acquiring a text;
the text is converted into speech, and a mouth image is generated based on the converted speech.
18. The apparatus of claim 16, wherein the splicing unit is further configured to perform:
extracting an eye image and an eyebrow image of a human face in the template image;
and splicing the mouth image, the eye image, the eyebrow image and the face edge image to obtain an input image.
19. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the method of any one of claims 1 to 9.
20. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, carries out the method according to any one of claims 1 to 9.
CN201910556085.4A 2019-06-25 2019-06-25 Method and device for establishing face generation model and generating face image Active CN112132912B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910556085.4A CN112132912B (en) 2019-06-25 2019-06-25 Method and device for establishing face generation model and generating face image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910556085.4A CN112132912B (en) 2019-06-25 2019-06-25 Method and device for establishing face generation model and generating face image

Publications (2)

Publication Number Publication Date
CN112132912A true CN112132912A (en) 2020-12-25
CN112132912B CN112132912B (en) 2024-02-13

Family

ID=73849756

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910556085.4A Active CN112132912B (en) 2019-06-25 2019-06-25 Method and device for establishing face generation model and generating face image

Country Status (1)

Country Link
CN (1) CN112132912B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115155058A (en) * 2022-09-06 2022-10-11 北京澜舟科技有限公司 Face pinching method, face pinching system and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609481A (en) * 2017-08-14 2018-01-19 百度在线网络技术(北京)有限公司 The method, apparatus and computer-readable storage medium of training data are generated for recognition of face
CN108062546A (en) * 2018-02-11 2018-05-22 厦门华厦学院 A kind of computer face Emotion identification system
CN108491775A (en) * 2018-03-12 2018-09-04 维沃移动通信有限公司 A kind of image correcting method and mobile terminal
CN109635745A (en) * 2018-12-13 2019-04-16 广东工业大学 A method of Multi-angle human face image is generated based on confrontation network model is generated
US20190114748A1 (en) * 2017-10-16 2019-04-18 Adobe Systems Incorporated Digital Image Completion Using Deep Learning
CN109886873A (en) * 2019-01-22 2019-06-14 华中科技大学 A kind of simulated portrait generation method and device based on deep learning

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107609481A (en) * 2017-08-14 2018-01-19 百度在线网络技术(北京)有限公司 The method, apparatus and computer-readable storage medium of training data are generated for recognition of face
US20190050632A1 (en) * 2017-08-14 2019-02-14 Baidu Online Network Technology (Beijing) Co., Ltd . Method and apparatus for generating training data for human face recognition, device and computer storage medium
US20190114748A1 (en) * 2017-10-16 2019-04-18 Adobe Systems Incorporated Digital Image Completion Using Deep Learning
CN108062546A (en) * 2018-02-11 2018-05-22 厦门华厦学院 A kind of computer face Emotion identification system
CN108491775A (en) * 2018-03-12 2018-09-04 维沃移动通信有限公司 A kind of image correcting method and mobile terminal
CN109635745A (en) * 2018-12-13 2019-04-16 广东工业大学 A method of Multi-angle human face image is generated based on confrontation network model is generated
CN109886873A (en) * 2019-01-22 2019-06-14 华中科技大学 A kind of simulated portrait generation method and device based on deep learning

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
姚乃明;郭清沛;乔逢春;陈辉;王宏安;: "基于生成式对抗网络的鲁棒人脸表情识别", 自动化学报, no. 05 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115155058A (en) * 2022-09-06 2022-10-11 北京澜舟科技有限公司 Face pinching method, face pinching system and storage medium

Also Published As

Publication number Publication date
CN112132912B (en) 2024-02-13

Similar Documents

Publication Publication Date Title
EP3885965B1 (en) Image recognition method based on micro facial expressions, apparatus and related device
CN107563283B (en) Method, device, equipment and storage medium for generating attack sample
CN108091328A (en) Speech recognition error correction method, device and readable medium based on artificial intelligence
CN110798636B (en) Subtitle generating method and device and electronic equipment
CN112541957B (en) Animation generation method, device, electronic equipment and computer readable medium
CN107609463B (en) Living body detection method, living body detection device, living body detection equipment and storage medium
CN110174942B (en) Eye movement synthesis method and device
CN113870395A (en) Animation video generation method, device, equipment and storage medium
US20210118232A1 (en) Method and System for Translating Air Writing To An Augmented Reality Device
CN114187624A (en) Image generation method, image generation device, electronic equipment and storage medium
CN110619334A (en) Portrait segmentation method based on deep learning, architecture and related device
CN109784128A (en) Mixed reality intelligent glasses with text and language process function
CN110188303A (en) Page fault recognition methods and device
US20220292690A1 (en) Data generation method, data generation apparatus, model generation method, model generation apparatus, and program
CN114255737B (en) Voice generation method and device and electronic equipment
CN115049016A (en) Model driving method and device based on emotion recognition
CN117152363A (en) Three-dimensional content generation method, device and equipment based on pre-training language model
AU2015259120A1 (en) Detecting conformance of graphical output data from an application to a convention
CN114049290A (en) Image processing method, device, equipment and storage medium
CN112132912B (en) Method and device for establishing face generation model and generating face image
CN112328088A (en) Image presenting method and device
CN109461203B (en) Gesture three-dimensional image generation method and device, computer equipment and storage medium
CN116703797A (en) Image fusion method, image fusion system, computer device and storage medium
Trujillo-Romero et al. Mexican Sign Language corpus: Towards an automatic translator
CN109857244B (en) Gesture recognition method and device, terminal equipment, storage medium and VR glasses

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant