CN112241708A - Method and apparatus for generating new person image from original person image - Google Patents

Method and apparatus for generating new person image from original person image Download PDF

Info

Publication number
CN112241708A
CN112241708A CN202011120139.1A CN202011120139A CN112241708A CN 112241708 A CN112241708 A CN 112241708A CN 202011120139 A CN202011120139 A CN 202011120139A CN 112241708 A CN112241708 A CN 112241708A
Authority
CN
China
Prior art keywords
image
background
original
foreground
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011120139.1A
Other languages
Chinese (zh)
Inventor
王宝锋
张武强
方志杰
郭子杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mercedes Benz Group AG
Original Assignee
Daimler AG
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Daimler AG filed Critical Daimler AG
Priority to CN202011120139.1A priority Critical patent/CN112241708A/en
Publication of CN112241708A publication Critical patent/CN112241708A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/04Context-preserving transformations, e.g. by using an importance map
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/20Image enhancement or restoration using local operators
    • G06T5/30Erosion or dilatation, e.g. thinning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • General Health & Medical Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Image Processing (AREA)
  • Processing Or Creating Images (AREA)

Abstract

The invention relates to the field of artificial intelligence. In particular to a method for generating a new person image from an original person image, comprising: providing a first original character image; acquiring a posture key point of a first original figure image; segmenting a first original person image into a foreground image and a background image; inputting the foreground image and the posture key point data into a foreground characteristic extraction model to extract an appearance characteristic vector and a posture characteristic vector, and inputting the background image into a background characteristic extraction model to extract a background characteristic vector; and inputting the appearance feature vector, the posture feature vector and the background feature vector into a synthesis model to synthesize a reconstructed image. The method further comprises the following steps: the first and second original character images are input to the trained neural network model to synthesize a new character image having the appearance of the character in the first original character image and the background and character pose in the second original character image. And to an apparatus for generating a new person image from an original person image.

Description

Method and apparatus for generating new person image from original person image
Technical Field
The present invention relates to a method for generating a new person image from an original person image. The invention also relates to an apparatus for generating a new person image from an original person image.
Background
In recent years, with the development of artificial intelligence such as deep learning and neural networks, Generative models represented by a Genetic Adaptive Network (GAN) and a Variational self-encoding (VAE) have been advanced greatly and widely used for the generation of data such as images and speech.
In the field of image generation, a method of generating a human image is also becoming a focus of research. However, the current people generating network/algorithm usually cuts out the image containing the people in the form of rectangular blocks (or patches), and then inputs the cut image into the network for training, and usually only focuses on the restoration and reconstruction capability of the foreground (i.e. people) image area, and ignores the reconstruction of the background image. Because the background and the foreground are not decoupled, on one hand, the method has poor background reconstruction capability, the generated image has fuzzy background, and a specific scene cannot be identified; on the other hand, the network computing power is dispersed by background pixels, and the restoration capability of the foreground image, especially high-frequency information such as details and the like, cannot be optimal. In addition, because the data input form cannot effectively control the background, the generated image based on the method is usually limited to the size of the character, the semantic consistency of the foreground and the background is poor, the image with the full scene information cannot be generated, and the generalization capability of the application scene of the generating model is severely limited.
Furthermore, in the area of video entertainment, such as in the movie production and video game production industries, there is a need for character "look shifting," i.e., transforming the appearance of a character in image a to the appearance of a character in image B without changing the pose and background of the character in image a. However, it is difficult for existing generative networks to achieve true and natural "look migration".
Therefore, it is desirable to provide a method for generating a human image, which can control the posture, the foreground, and the background of the human image and can generate a human image in which the posture, the foreground, and the background are well integrated.
Disclosure of Invention
The object of the invention is achieved by providing a method for generating a new person image from an original person image, the method comprising at least the steps of:
i) providing a first original character image;
ii) obtaining pose key point data of the person in the first original person image;
iii) segmenting the first original person image into a foreground image and a background image;
iv) inputting the foreground image and the pose key point data into a foreground feature vector extraction model to extract character appearance feature vectors
Figure BDA0002731709690000021
And character pose key point feature vectors
Figure BDA0002731709690000022
Inputting the background image into a background feature vector extraction model to extract a background feature vector
Figure BDA0002731709690000023
And
v) transforming the character appearance feature vector
Figure BDA0002731709690000024
Character posture key point feature vector
Figure BDA0002731709690000025
And background feature vector
Figure BDA0002731709690000026
The image synthesis model is input to synthesize a reconstructed image of the first original personal image.
According to another aspect of the invention, the object of the invention is also achieved by a method for generating a new person image from an original person image, the method comprising at least the steps of:
i') providing a first original personal image and a second original personal image different from the first original personal image;
ii') obtaining pose key point data of the respective person in the first original person image and the second original person image;
iii') segmenting the first original person image and the second original person image into a foreground image and a background image, respectively;
iv') inputting the foreground image of the first original character image and the pose key point data of the second original character image into the foreground feature vector extraction model to extract the character appearance feature vector
Figure BDA0002731709690000027
And character pose key point feature vectors
Figure BDA0002731709690000028
Inputting the background image of the second original character image into a background feature vector extraction model to extract a background feature vector
Figure BDA0002731709690000029
And
v') the character appearance feature vector
Figure BDA00027317096900000210
Character posture key point feature vector
Figure BDA00027317096900000211
And background feature vector
Figure BDA00027317096900000212
Inputting an image synthesis model to synthesize a new person image, the new person imageHas the background and character pose in the second original character image and the character appearance in the first original character image.
According to an alternative embodiment of the invention, the foreground feature vector extraction model is configured as a foreground-generating network for reconstructing or generating foreground images, the extracted character appearance feature vectors
Figure BDA0002731709690000031
And character pose key point feature vectors
Figure BDA0002731709690000032
The feature is a dimension reduction feature extracted from foreground image and pose key point data in the foreground generation type network.
According to an alternative embodiment of the invention, the background feature vector extraction model is configured as a background generating network for reconstructing a background image, the extracted background feature vectors
Figure BDA0002731709690000033
Is a dimension reduction feature extracted from the background image in the background generation type network.
According to an alternative embodiment of the invention, the foreground or background generative network model is constructed using any of the following generative networks: generating type antagonistic neural network, variation self-coding and derivative model.
According to an alternative embodiment of the invention, step iii) or iii') is performed in the following way:
a) generating a character mask based on the pose key point data;
b) and performing image segmentation on the first original character image and the second original character image by using the character mask to generate a foreground image only containing characters and a background image only containing a background.
According to an alternative embodiment of the invention, step a) is performed in the following manner:
-connecting pose key points to each other based on real human skeletal structure to generate a skeletal binary image;
-performing dilation and/or erosion processing on the skeleton binary image; and
-filling isolated zero-valued void regions in the dilated and/or eroded skeleton binary image in order to generate a human mask.
According to still another aspect of the present invention, the object of the present invention is also achieved by a method for generating a new personal image from an original personal image, the method comprising: inputting the first original character image or the first and second original character images to a trained neural network model composed of a foreground generating network model, a background generating network model and a synthesizing network model to synthesize a new character image; the neural network model is trained in the following way:
i ") providing a training image containing a person;
ii ") obtaining pose key point data of the person in the training image;
iii ") segmenting the training image into a foreground image and a background image;
iv ") inputting the foreground image and the pose key point data into a foreground generation type network model to train the foreground generation type network model, and inputting the background image into a background generation type network model to train the background generation type network model; and
v') extracting character appearance feature vector as dimension reduction feature from foreground image and pose key point data in foreground generation type network model
Figure BDA0002731709690000041
And character pose key point feature vectors
Figure BDA0002731709690000042
And background feature vector extracted from background image as dimension-reduced feature in background generation type network model
Figure BDA0002731709690000043
The synthetic network model is input to train the synthetic network model.
According to an alternative embodiment of the invention, the foreground generative network model, the background generative network model and the synthetic network model are trained independently, interactively or in association.
According to a further aspect of the invention, the object of the invention is also achieved by an apparatus for generating a new person image from an original person image, the apparatus comprising a processor and a computer readable storage means communicatively connected to the processor, the computer readable storage means having stored thereon a computer program for implementing the method described herein, when the computer program is executed by the processor.
According to yet another aspect of the invention, the object of the invention is also achieved by an apparatus for generating a new person image from an original person image, said apparatus being configured for implementing the method described herein and comprising:
a pose key point recognition means configured to determine pose key point data of a person in the input original person image;
a person mask generation model configured to generate a person mask;
a front-background segmentation model configured to segment an input original character image into a foreground image and a background image;
a foreground feature vector extraction model configured for extracting character appearance feature vectors from foreground images and pose key point data
Figure BDA0002731709690000044
And character pose key point feature vectors
Figure BDA0002731709690000045
A background feature vector extraction model configured for extracting a background feature vector from a background image
Figure BDA0002731709690000046
And
an image synthesis model configured for synthesizing feature vectors from human appearance
Figure BDA0002731709690000049
Character posture key point feature vector
Figure BDA0002731709690000048
And background feature vector
Figure BDA0002731709690000047
A new person image is synthesized.
According to the invention, it is achieved that: in the training process, the foreground and the background are decoupled, then the foreground and the background are learned through two independent generation networks, and the foreground and the background are fused through a synthesis network, so that the mixed training of the whole image generation model is completed.
The invention provides a pedestrian image generation method based on generation network and mixed training of foreground and background, which effectively improves the image quality of the foreground and the background of the generated character image and the semantic consistency of the foreground and the background by performing mixed training of decoupling and fusion on the pedestrian and the background at different stages, and greatly improves the generalization capability of the application scene of a generation model.
Further advantages and advantageous embodiments of the inventive subject matter are apparent from the description, the drawings and the claims.
Drawings
Further features and advantages of the present invention will be further elucidated by the following detailed description of an embodiment thereof, with reference to the accompanying drawings. The attached drawings are as follows:
fig. 1 is a schematic block diagram of an apparatus 100 for generating a new personal image from an original personal image according to an exemplary embodiment of the present invention;
FIG. 2 illustrates a schematic diagram of pose keypoints, according to an exemplary embodiment of the invention;
FIG. 3 shows a flow diagram of an image segmentation process according to an exemplary embodiment of the present invention;
FIG. 4 is a block schematic diagram of a neural network model for generating a new person image from an original person image, according to an exemplary embodiment of the present invention;
FIG. 5 shows a flow diagram of a model training method 200 for training a neural network model, according to an example embodiment of the present invention;
FIG. 6 shows a flowchart of an image segmentation step according to an exemplary embodiment of the present invention;
FIG. 7 shows a flowchart of the steps of generating a human mask according to an exemplary embodiment of the invention;
FIG. 8 illustrates a flowchart of a method for synthesizing a new person image from two original person images, according to an exemplary embodiment of the present invention; and
fig. 9 shows a flowchart of a method for reconstructing an original person image according to an exemplary embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantageous effects of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and exemplary embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the scope of the invention. In the drawings, the same or similar reference numerals refer to the same or equivalent parts.
Fig. 1 is a schematic block diagram of an apparatus 100 for generating a new personal image from an original personal image according to an exemplary embodiment of the present invention. The device 100 includes a processor 10 and a computer readable storage device 20 communicatively coupled to the processor 10. The computer-readable storage means 20 have stored therein a computer program for implementing the method for generating a person image, which will be explained in detail below, when the computer program is executed by the processor 10.
According to an exemplary embodiment, a display device 30 is provided in communicative connection with the processor 10. By means of the display device 30, the user can view the original personal image to be processed by the device 100 and the new personal image generated by the device 100.
According to an exemplary embodiment, an input device 40 is provided in communicative connection with the processor 10. By means of the input device 40, the user can select or input an original person image to be processed by the device 100. The input device 40 may include, for example: a keyboard, a mouse, and/or a touch screen.
According to an exemplary embodiment, a camera 50 is provided in communicative connection with the processor 10. By means of the camera device 50, the user can take a person image as an original person image to be processed by the device 100. The imaging device 50 is, for example, an in-vehicle imaging device.
According to an exemplary embodiment, a personal image set composed of a plurality of personal images is provided. The original personal image set may be stored in the computer readable storage device 20 or another storage device communicatively connected to the processor 10.
Fig. 4 is a schematic block diagram of a neural network model 400 for generating a new human image from an original human image according to an exemplary embodiment of the present invention.
The neural network model 400 mainly includes: a pose keypoint recognition model 410 configured to recognize human pose keypoints in the input original character image; a character mask generation model 420 configured to generate a character mask I that just covers the entire character in the input original character imageMask(ii) a A front background segmentation model 430 configured for use, for example, based on the human mask IMaskSegmenting an input original person image into a foreground image and a background image; a foreground feature vector extraction model 440 configured for extracting character appearance feature vectors
Figure BDA0002731709690000061
And character pose key point feature vectors
Figure BDA0002731709690000062
A background feature vector extraction model 450 configured for extracting background feature vectors
Figure BDA0002731709690000063
And an image composition model 460 configured for characterizing the vector by character appearance
Figure BDA0002731709690000064
Character posture key point feature vector
Figure BDA0002731709690000065
And background feature vector
Figure BDA0002731709690000066
A new person image is synthesized.
In an example, the foreground feature vector extraction model 440, the background feature vector extraction model 450, and the image synthesis model 460 are configured as a neural network model of a suitable form, such as a generative network model, and in particular as a foreground generative network model, a background generative network model, and a synthetic network model, respectively, trained by the model training method 200, which will be described in detail below with reference to fig. 5.
In one example, when the input original character image is accompanied by annotated pose keypoint information, the pose keypoint recognition model may be omitted.
Fig. 5 shows a flowchart of a model training method 200 for training a neural network model 400 for generating new personal images from original personal images, according to an exemplary embodiment of the present invention.
According to the model training method 200, in step S210, an original character image is provided. Illustratively, the original person image may be any one of the above-mentioned original person image sets. Alternatively, the original person image is a person, such as a pedestrian image, captured by the user by means of the camera 50, such as an in-vehicle camera, or a frame of person image captured from a video stream.
Next, in step S220, pose key point (key points) data of the person in the original person image is acquired. Pose key points generally include, but are not limited to: left and right eyes, left and right ears, nose, mouth, neck, left and right shoulders, left and right crotch, left and right elbows, left and right wrists, left and right knees, left and right ankles, etc., as shown by the plurality of white spots 50 in fig. 2.
In one example, pose key point data may be obtained by manually annotating an image. In another example, the pose keypoint data may be calculated by inputting an image of the original character into the pose keypoint recognition model 410. The Pose keypoint recognition model may be constructed using human detection algorithms such as Open Pose, Pifpaf, HR-Net, and the like.
According to an exemplary embodiment of the present invention, pre-labeling of pose keypoints may be performed for each image in the original character image set. In this case, the original personal image is provided in step S210 together with the pose key point data of the person in the original personal image.
Additionally or alternatively, the labeled original person image set may be divided into training subsets datatrainVerifying subset datavalAnd test subset datatest
Then, in step S230, the foreground and background of the original character image are segmented based on the pose key point data acquired in step S220, for example, by means of the character mask generation model 420 and the front-background segmentation model 430 to obtain a foreground image and a background image.
According to an exemplary embodiment, step S220 further comprises (see fig. 6):
in step S221, a character mask I is generated based on the pose key point dataMask
Then, in step S222, the human mask I is usedMaskImage segmentation is performed on the original person image to generate a foreground image containing only persons and a background image containing only the background.
To this end, referring to fig. 3, fig. 3 shows an image segmentation process according to an exemplary embodiment of the present invention, wherein an original person image 31 is segmented into a background image 33 and a foreground image 34 by means of a generated person mask 32.
Further, step S221 may further, exemplarily, include (see fig. 7):
in step S2211, connecting the acquired pose key points to each other based on the real human skeleton structure to generate a character skeleton binary image;
in step S2212, a closing operation (expansion) and then erosion (erosion) processing is performed on the skeleton binary image; and
in step S2213, a filling process (fills) is performed on the expanded and eroded skeleton binary image to fill isolated zero-value hole regions in the skeleton binary image, thereby obtaining a human mask IMask
In an example, the scale of the dilation and/or erosion operator may be determined according to the human skeleton size.
Alternatively, the person mask may be generated by other suitable methods known in the art, such as manual semantic segmentation labeling (i.e., manually labeling the pixels of the original person image that contain the person) or image pre-processing.
According to an exemplary embodiment, the acquired foreground image and background image may be the same size as the original person image, except that the background area in the foreground image is zero-valued, while the foreground area in the background image is zero-valued, as shown in fig. 3.
Next, in step S240, the foreground image obtained in step S230 and the pose key point data obtained in step S220 are input into a foreground generation type network model to train the foreground generation type network model, wherein the foreground generation type network model is configured to implement reconstruction and generation of a foreground character. The foreground generative network model corresponds to or includes the foreground feature vector extraction model 440 described previously.
In general, the following equations may be used to describe the foreground generated network model:
Figure BDA0002731709690000081
wherein x iscRepresenting input pose-key data, xaRepresenting the input foreground image data and,
Figure BDA0002731709690000082
representation generationOr a reconstructed foreground image.
According to an exemplary embodiment, the foreground generative network model is constructed using any of the following generative networks: generating type antagonistic neural network, variation self-coding and derivative model.
According to an exemplary embodiment, the generator of the foreground generative network model includes encoding (encode) and decoding (decode) processes. In the training and generating process, the generator will generate data xcAnd xaCoding and reducing dimension, and finally entering a bottleneck layer (bottle neck layer) to obtain a feature vector after dimension reduction
Figure BDA0002731709690000091
And
Figure BDA0002731709690000092
the layer characteristics contain main component information of character structure and appearance and have strong controllability. Then, the feature vector after dimension reduction
Figure BDA0002731709690000093
And
Figure BDA0002731709690000094
entering a decoding process to obtain a generated or reconstructed foreground image
Figure BDA0002731709690000095
In the training process, the model G is guidedfConverging to obtain better image restoration and generalization capability, and properly designing loss function
Figure BDA0002731709690000096
To constrain the training process of the foreground generative network model.
According to an exemplary embodiment of the present invention, a loss function of a foreground generated network model
Figure BDA0002731709690000097
Comprises the following steps:
Figure BDA0002731709690000098
wherein, Div (-) represents the divergence of the feature vector, L (-) represents the loss of L1 or L2 norm, GAN (-) represents the loss of GAN, and λ1And λ2Respectively, representing the weight parameters. The detailed definitions of the above parameters are not intended to limit the present invention.
According to an exemplary embodiment, before the foreground image is input to the generator, image pre-processing is first performed on the foreground image and the pre-processed foreground image is then input to the generator to perform the encoding and decoding processes. Illustratively, the image pre-processing comprises: and performing a series of image preprocessing operations such as character limb clipping, rotation and/or normalization on the foreground image based on the posture key point data. The present invention is not particularly limited with respect to the features and details of the image preprocessing operation.
In step S250, the background image acquired in step S230 is input into a background generation type network model to train the background generation type network model, wherein the background generation type network model is configured to be used for realizing reconstruction of the background. The background generated network model may correspond to or include the background feature vector extraction model 450 previously described.
Unlike the foreground-generated network model, the background-generated network model focuses on the reconstruction of the background, and does not need to change or migrate the content contained in the background.
In general, the following equations may be employed to describe a background generative network model:
Figure BDA0002731709690000099
wherein x isbRepresenting the input background image data of the image,
Figure BDA00027317096900000910
representing the reconstructed background image.
According to an exemplary embodiment, the background generative network model is constructed using any of the following generative networks: generating type antagonistic neural network, variation self-coding and derivative model.
Without loss of generality, the generator of the background generative network model also comprises the processes of encoding and decoding, similar to the foreground generative network model. In the training and generating process, the generator firstly inputs background image data xbCoding and reducing dimension, and finally entering a bottleneck layer to obtain a feature vector subjected to dimension reduction
Figure BDA0002731709690000101
The layer features generally contain the principal component information of the background and have strong controllability. Then, the feature vector
Figure BDA0002731709690000102
Entering a decoding process to obtain a reconstructed background image
Figure BDA0002731709690000103
Further, in the training process, the model G is guidedbConverging to achieve better background reconstruction capability by properly designing the loss function
Figure BDA0002731709690000104
To constrain the training process of the background generative network model.
According to an exemplary embodiment of the present invention, a loss function of a background generative network model
Figure BDA0002731709690000105
Comprises the following steps:
Figure BDA0002731709690000106
wherein Φ represents a visual feature vector extractor, which may be a network such as VGG, Resnet, etc., or several layers thereof, or may be an original image pixel itself; l represents a similarity equation used to measure the similarity of two visual feature vectors, which may be a known L1 distance equation and/or L2 distance equation.
According to an exemplary embodiment, the background image acquired in step S230 may be directly input to the generator of the background-generating network model to implement the encoding and decoding processes. Alternatively, before inputting the background image into the generator of the background generating network model, the background image may be first area-planned or cropped to obtain a combination of a series of image blocks (patch) and then the obtained combination of image blocks may be input into the generator to perform the encoding and decoding processes.
Then, in step S260, the human appearance feature vector of the encoder bottleneck layer obtained in step S240 is used
Figure BDA0002731709690000107
And character pose key point feature vectors
Figure BDA0002731709690000108
And the background feature vector of the encoder bottleneck layer obtained in step S250
Figure BDA0002731709690000109
Inputting the synthetic network model to train the synthetic network model.
In general, the following equations may be used to describe the synthetic network model:
Figure BDA00027317096900001010
wherein the content of the first and second substances,
Figure BDA00027317096900001011
representing a composite personal image that is a reconstruction of the entire image of the original personal image.
Further, in the training process of the synthetic network model, the model G is guidedsConvergence is promoted, foreground and background fusion is promoted, so that the generated image achieves a good global effect, and a loss function is properly designed
Figure BDA00027317096900001012
To constrain the training process for the synthetic network model.
According to an example of the present invention, a loss function of a network model is synthesized
Figure BDA00027317096900001013
Comprises the following steps:
Figure BDA0002731709690000111
wherein Φ represents a visual feature vector extractor, which may be a network such as VGG, Resnet, etc., or several layers thereof, or may be an original image pixel itself; l denotes a similarity equation used to measure the similarity of two visual feature vectors, which may be an L1 distance equation and/or an L2 distance equation, etc.
In the model training method according to the present invention, a foreground generative network model for generating or reconstructing a foreground image, a background generative network model for reconstructing a background image, and a synthetic network model may be trained independently, interactively, or jointly. In particular, the training of the three network models may be implemented in any suitable order or in any suitable interactive or joint manner.
Fig. 8 shows a flowchart of a method 300 for synthesizing a new personal image from two original personal images according to an exemplary embodiment of the present invention. The method 300 may be implemented by inputting two original human images to be synthesized into the neural network model 400 trained by the model training method 200 explained above in connection with fig. 5.
In the method 300, in step S310, a first original person image is provided.
Then, in step S320, pose keypoint data in the first original person image is acquired, for example, by inputting the first original person image into the pose keypoint recognition model 410.
In step S330, the first original character image is segmented into a foreground image and a background image based on the pose key point data acquired in step S320, for example, by means of the character mask generation model 420 and the front-background segmentation model 430.
On the other hand, in step S340, a second original personal image different from the first original personal image is provided.
Next, in step S350, pose keypoint data in the second original person image is acquired, for example, by inputting the second original person image into the pose keypoint recognition model 410.
In step S360, the second original character image is segmented into a foreground image and a background image based on the pose key point data acquired in step S350, for example, by means of the character mask generation model 420 and the front-background segmentation model 430.
Then, in step S370, the person appearance feature vector as a feature after dimension reduction (e.g., a feature of the encoder bottleneck layer) is extracted from the foreground image of the first original personal image and the pose key point data of the second original personal image obtained in step S350, for example, by inputting the foreground image of the first original personal image obtained in step S330 and the pose key point data of the second original personal image obtained in step S350 into the foreground feature vector extraction model 440
Figure BDA0002731709690000121
And character pose key point feature vectors
Figure BDA0002731709690000122
On the other hand, in step S380, a background feature vector as a feature after dimension reduction (e.g., a feature of an encoder bottleneck layer) is extracted from the background image of the second original personal image by, for example, inputting the background image of the second original personal image acquired in step S360 into the background feature vector extraction model 450
Figure BDA0002731709690000123
Next, in step S390, for example, by using the human appearance feature vector extracted in step S370
Figure BDA0002731709690000124
And character pose key point feature vectors
Figure BDA0002731709690000125
And the background feature vector extracted in step S380
Figure BDA0002731709690000126
The image synthesis model 460 is inputted to synthesize a new personal image. The new person image has the background and person pose in the second original person image but has the appearance (i.e., look and dress) of the person in the first original person image.
Thus, the method 300 may be essentially understood as a method of transforming the appearance of a person in one image of a person into the appearance of a person in another image of the person while maintaining the background and pose of the person. The method 300 may be used in a variety of situations, such as in the field of data enhancement or video entertainment.
FIG. 9 shows a flowchart of a method 500 for reconstructing an image of an original person, according to an example embodiment of the present invention. The method 500 may be implemented by inputting an original human image to be reconstructed into the neural network model 400 trained by the model training method 200 explained above in connection with fig. 5.
In the method 500, in step S510, an original person image is provided.
Then, in step S520, pose keypoint data in the original person image is acquired, for example, by inputting the original person image into the pose keypoint recognition model 410.
In step S530, the original character image is segmented into a foreground image and a background image based on the pose key point data acquired in step S520, for example, by means of the character mask generation model 420 and the front-background segmentation model 430.
Then, in step S540, the foreground image and pose are closed, for example, by inputting the foreground image acquired in step S530 and the pose key point data acquired in step S520 into the foreground feature vector extraction model 440Key point data extraction as a reduced-dimension feature (e.g., a feature of the encoder bottleneck layer) character appearance feature vector
Figure BDA0002731709690000127
And character pose key point feature vectors
Figure BDA0002731709690000128
On the other hand, in step S550, a background feature vector as a feature after dimension reduction (e.g., a feature of an encoder bottleneck layer) is extracted from the background image by, for example, inputting the background image acquired in step S530 into the background feature vector extraction model 450
Figure BDA0002731709690000131
Next, in step S560, for example, by using the human appearance feature vector extracted in step S540
Figure BDA0002731709690000132
And character pose key point feature vectors
Figure BDA0002731709690000133
And the background feature vector extracted in step S550
Figure BDA0002731709690000134
The image synthesis model 460 is inputted to synthesize a new personal image. The new personal image is a reconstruction or restoration of the entire image of the input original personal image.
The method 500 may be used in a variety of situations, such as data enhancement or video entertainment.
According to the invention, as the three network models of the foreground, the background and the synthesis are adopted, the generated image not only has vivid foreground and background, but also has natural transition between the foreground and the background and obviously enhanced scene semantic consistency.
Although some embodiments have been described, these embodiments have been presented by way of example only, and are not intended to limit the scope of the invention. The appended claims and their equivalents are intended to cover all such modifications, substitutions and changes as fall within the true scope and spirit of the invention.

Claims (10)

1. A method (500) for generating a new person image from an original person image, the method (500) comprising at least the steps of:
i) providing a first original character image;
ii) obtaining pose key point data of the person in the first original person image;
iii) segmenting the first original person image into a foreground image and a background image;
iv) inputting foreground images and the pose key point data into a foreground feature vector extraction model (440) to extract character appearance feature vectors
Figure FDA0002731709680000011
And character pose key point feature vectors
Figure FDA0002731709680000012
Inputting the background image into a background feature vector extraction model (450) to extract a background feature vector
Figure FDA0002731709680000013
And
v) transforming the character appearance feature vector
Figure FDA0002731709680000014
Character posture key point feature vector
Figure FDA0002731709680000015
And background feature vector
Figure FDA0002731709680000016
An image synthesis model (460) is input to synthesize a reconstructed image of the first original person image.
2. A method (300) for generating a new person image from an original person image, said method (300) comprising at least the steps of:
i') providing a first original personal image and a second original personal image different from the first original personal image;
ii') obtaining pose key point data of the respective person in the first original person image and the second original person image;
iii') segmenting the first original person image and the second original person image into a foreground image and a background image, respectively;
iv') inputting the foreground image of the first original character image and the pose key point data of the second original character image into a foreground feature vector extraction model (440) to extract character appearance feature vectors
Figure FDA0002731709680000017
And character pose key point feature vectors
Figure FDA0002731709680000018
Inputting the background image of the second original person image into a background feature vector extraction model (450) to extract a background feature vector
Figure FDA0002731709680000019
And
v') the character appearance feature vector
Figure FDA00027317096800000110
Character posture key point feature vector
Figure FDA00027317096800000111
And background feature vector
Figure FDA00027317096800000112
Inputting an image synthesis model (460) to synthesize a new personal image having the second original personal imageThe background and character pose in (1) and the character appearance in the first original character image.
3. The method (300,500) of claim 1 or 2,
the foreground feature vector extraction model (440) is configured as a foreground-generating network for reconstructing or generating foreground images, the extracted character appearance feature vectors
Figure FDA0002731709680000021
And character pose key point feature vectors
Figure FDA0002731709680000022
The feature is a dimension reduction feature extracted from foreground image and posture key point data in a foreground generation type network; and/or
The background feature vector extraction model (450) is configured as a background generating network for reconstructing a background image, the extracted background feature vectors
Figure FDA0002731709680000023
Is a dimension reduction feature extracted from the background image in the background generation type network.
4. The method (300,500) of claim 3,
constructing a foreground or background generative network model using any one of the following generative networks: generative antagonistic neural networks (GAN), variational self-encoding (VAE) and derived models thereof.
5. The method (300,500) according to any one of the preceding claims, wherein step iii) or iii') is performed in the following way:
a) generating a character mask based on the pose key point data;
b) the first and second original person images are image-segmented using the person mask to generate a foreground image containing substantially only persons and a background image containing substantially only backgrounds.
6. The method (300,500) according to claim 5, wherein step a) is performed in the following manner:
mutually connecting the posture key points based on the real human skeleton structure to generate a skeleton binary image;
performing expansion and/or corrosion treatment on the skeleton binary image; and
isolated zero-valued void regions in the dilated and/or eroded skeleton binary image are filled in to generate a human mask.
7. A method (300,500) for generating a new person image from an original person image, the method comprising: inputting the first original character image or the first and second original character images into a trained neural network model composed of a foreground generation type network model, a background generation type network model and a synthesis network model to synthesize a new character image; the neural network model is trained in the following way:
i ") providing a training image containing a person;
ii ") obtaining pose key point data of the person in the training image;
iii ") segmenting the training image into a foreground image and a background image;
iv ") inputting the foreground image and the pose key point data into a foreground generation type network model to train the foreground generation type network model, and inputting the background image into a background generation type network model to train the background generation type network model; and
v') extracting character appearance feature vector as dimension reduction feature from foreground image and pose key point data in foreground generation type network model
Figure FDA0002731709680000031
And character pose key point feature vectors
Figure FDA0002731709680000032
And backgroundBackground feature vector extracted from background image as dimension reduction feature in generative network model
Figure FDA0002731709680000033
The synthetic network model is input to train the synthetic network model.
8. The method (300,500) of claim 7,
the foreground generative network model, the background generative network model and the synthetic network model are trained independently, interactively or jointly.
9. An apparatus (100) for generating a new person image from an original person image, the apparatus (100) comprising a processor (10) and a computer readable storage device (20) communicatively connected to the processor (10), the computer readable storage device (20) having stored thereon a computer program for carrying out the method (300,500) according to any one of the preceding claims, when the computer program is executed by the processor (10).
10. An apparatus (400) for generating a new person image from an original person image, the apparatus (400) being configured to implement the method (300,500) according to any one of claims 1-8 and comprising:
a pose key point recognition means (410) configured to determine pose key point data of a person in the input original person image;
a person mask generation model (420) configured to generate a person mask;
a front-background segmentation model (430) configured to segment an input original character image into a foreground image and a background image;
a foreground feature vector extraction model (440) configured for extracting character appearance feature vectors from foreground images and pose key point data
Figure FDA0002731709680000041
And the posture of the personKeypoint feature vectors
Figure FDA0002731709680000042
A background feature vector extraction model (450) configured for extracting a background feature vector from a background image
Figure FDA0002731709680000043
And
an image composition model (460) configured for characterizing a vector of features by character appearance
Figure FDA0002731709680000044
Character posture key point feature vector
Figure FDA0002731709680000045
And background feature vector
Figure FDA0002731709680000046
A new person image is synthesized.
CN202011120139.1A 2020-10-19 2020-10-19 Method and apparatus for generating new person image from original person image Pending CN112241708A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011120139.1A CN112241708A (en) 2020-10-19 2020-10-19 Method and apparatus for generating new person image from original person image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011120139.1A CN112241708A (en) 2020-10-19 2020-10-19 Method and apparatus for generating new person image from original person image

Publications (1)

Publication Number Publication Date
CN112241708A true CN112241708A (en) 2021-01-19

Family

ID=74169181

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011120139.1A Pending CN112241708A (en) 2020-10-19 2020-10-19 Method and apparatus for generating new person image from original person image

Country Status (1)

Country Link
CN (1) CN112241708A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112991484A (en) * 2021-04-28 2021-06-18 中国科学院计算技术研究所数字经济产业研究院 Intelligent face editing method and device, storage medium and equipment
CN113919998A (en) * 2021-10-14 2022-01-11 天翼数字生活科技有限公司 Image anonymization method based on semantic and attitude map guidance

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112991484A (en) * 2021-04-28 2021-06-18 中国科学院计算技术研究所数字经济产业研究院 Intelligent face editing method and device, storage medium and equipment
CN113919998A (en) * 2021-10-14 2022-01-11 天翼数字生活科技有限公司 Image anonymization method based on semantic and attitude map guidance
WO2023060918A1 (en) * 2021-10-14 2023-04-20 天翼数字生活科技有限公司 Image anonymization method based on guidance of semantic and pose graphs
CN113919998B (en) * 2021-10-14 2024-05-14 天翼数字生活科技有限公司 Picture anonymizing method based on semantic and gesture graph guidance

Similar Documents

Publication Publication Date Title
Wang et al. A state-of-the-art review on image synthesis with generative adversarial networks
CN111340122B (en) Multi-modal feature fusion text-guided image restoration method
Din et al. A novel GAN-based network for unmasking of masked face
CN110222668B (en) Multi-pose facial expression recognition method based on generation countermeasure network
CN110796593A (en) Image processing method, device, medium and electronic equipment based on artificial intelligence
CN112241708A (en) Method and apparatus for generating new person image from original person image
CN114724214B (en) Micro-expression editing method and system based on facial action unit
KR102373606B1 (en) Electronic apparatus and method for image formation, and program stored in computer readable medium performing the same
CN111612687B (en) Automatic makeup method for face image
CN114187165A (en) Image processing method and device
CN114863533A (en) Digital human generation method and device and storage medium
CN114663274A (en) Portrait image hair removing method and device based on GAN network
Tan et al. Style2talker: High-resolution talking head generation with emotion style and art style
CN117333604A (en) Character face replay method based on semantic perception nerve radiation field
Choi et al. Improving diffusion models for virtual try-on
CN115631285B (en) Face rendering method, device, equipment and storage medium based on unified driving
CN112990123B (en) Image processing method, apparatus, computer device and medium
CN116385606A (en) Speech signal driven personalized three-dimensional face animation generation method and application thereof
He et al. Fa-gans: Facial attractiveness enhancement with generative adversarial networks on frontal faces
CN115035219A (en) Expression generation method and device and expression generation model training method and device
CN109657589B (en) Human interaction action-based experiencer action generation method
Jiang et al. Multi-modality deep network for jpeg artifacts reduction
Xia et al. 3D information guided motion transfer via sequential image based human model refinement and face-attention GAN
Gao et al. Complex manga coloring method based on improved Pix2Pix Model
Zheng et al. Attributes and semantic constrained GAN for face sketch-photo synthesis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination