CN111815756A

CN111815756A - Image generation method and device, computer readable medium and electronic equipment

Info

Publication number: CN111815756A
Application number: CN201910296009.4A
Authority: CN
Inventors: 张炜; 梅涛
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2019-04-12
Filing date: 2019-04-12
Publication date: 2020-10-23

Abstract

The disclosure provides an image generation method, an image generation device, a computer readable medium and an electronic device, and relates to the technical field of computers. The method comprises the following steps: determining a first semantic image corresponding to the first image; determining a second semantic image according to the first semantic image and the target information; and generating a second image according to the second semantic image, the target information and the first image. The image generation method in the disclosure can overcome the problem of poor image generation effect to a certain extent, and further improve the image generation effect; and, modeling of the mapping relationship between the input image (e.g., the first image) and the output image (e.g., the second image) can be simplified through a semantic conversion process and an image generation process, thereby improving the human body structure reconstruction effect when generating the human figure image.

Description

Image generation method and device, computer readable medium and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to an image generation method, an image generation apparatus, a computer-readable medium, and an electronic device.

Background

At present, the application of image generation technology in the multimedia field and the computer vision field is more and more extensive, for example, the image generation technology can be applied to image video editing, art movie making, retail goods display and the like. Among others, in the display of retail goods (e.g., a clothing display, a display model in which clothing is involved), Image Generation technology may be applied to Person Image Generation (Person Image Generation). The character image generation is to generate a character image having a posture of a target character based on a current character image and a posture of the target character to be generated.

Specifically, the generation of the person image may include: feature points in the current person image are extracted according to a generation countermeasure Network (GAN) technology in an image generation technology, and a person image with a target person posture is generated according to the feature points. However, the person attributes (for example, the clothing style or the body shape structure of the person) in the person image generated by the person image generation method are likely to be greatly different from the person attributes in the original person image (i.e., the current person image), which may result in a problem of poor image generation effect.

It is to be noted that the information disclosed in the above background section is only for enhancement of understanding of the background of the present disclosure, and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The present disclosure is directed to an image generation method, an image generation apparatus, a computer-readable medium, and an electronic device, which overcome the problem of poor image generation effect at least to some extent, and thereby improve the image generation effect.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows, or in part will be obvious from the description, or may be learned by practice of the disclosure.

A first aspect of the present disclosure provides an image generation method, including: determining a first semantic image corresponding to the first image; determining a second semantic image according to the first semantic image and the target information; and generating a second image according to the second semantic image, the target information and the first image.

In an exemplary embodiment of the present disclosure, the target information includes pose information in the second image.

In an exemplary embodiment of the present disclosure, the image generating method further includes: and storing the first semantic image and the second semantic image as a corresponding relation.

In an exemplary embodiment of the present disclosure, the image generating method further includes: determining a plurality of semantic image groups; each semantic image group comprises at least two semantic images; in the semantic image group, the corresponding posture information of each semantic image is different, and the corresponding semantic information of each semantic image is the same; and training a semantic conversion model according to the plurality of semantic image groups.

In an exemplary embodiment of the present disclosure, determining a plurality of semantic image groups includes: determining semantic images corresponding to the images in each image set; and matching the plurality of semantic images according to the semantic information corresponding to each semantic image to determine a plurality of semantic image groups.

In an exemplary embodiment of the present disclosure, training a semantic conversion model from a plurality of semantic image groups includes: determining a target semantic image group from a plurality of semantic image groups; processing the target posture information and the first target semantic image through a semantic conversion model to generate a third image; the target semantic image group comprises a first target semantic image, and the target posture information corresponds to a second target semantic image in the target semantic image group; determining a numerical value corresponding to the first loss function according to the comparison of the third image and the second target semantic image; and updating the semantic conversion model according to the value corresponding to the first loss function.

In an exemplary embodiment of the present disclosure, determining the second semantic image from the first semantic image and the target information includes: and processing the first semantic image and the target information through the updated semantic conversion model to determine a second semantic image.

In an exemplary embodiment of the present disclosure, each image set includes an image and pose information to be generated corresponding to the image, and the image generation method further includes: processing the target image in the target image set and the posture information to be generated corresponding to the target image according to the semantic conversion model and the image generation model to generate a fourth image, wherein the fourth image corresponds to the posture information to be generated corresponding to the target image; wherein the plurality of image sets comprise a target image set; processing the posture information corresponding to the fourth image and the target image according to the semantic conversion model and the image generation model to generate a fifth image, wherein the fifth image corresponds to the posture information of the target image; determining a numerical value corresponding to the second loss function according to the comparison between the target image and the fifth image; and updating the semantic conversion model and the image generation model according to the numerical value corresponding to the second loss function.

In an exemplary embodiment of the present disclosure, generating the second image from the second semantic image, the target information, and the first image includes: and processing the second semantic image, the target information and the first image through the updated image generation model to generate a second image.

In an exemplary embodiment of the disclosure, the first penalty function includes a first pair of impairment penalty terms and a cross entropy penalty term.

In an exemplary embodiment of the disclosure, the second pair of impairment resistant terms, the pose impairment term, the content consistency impairment term, the semantically guided style impairment term, and the face impairment term are included in the second impairment function.

According to a second aspect of the present disclosure, there is provided an image generating apparatus including a semantic decomposition unit, a semantic conversion unit, and an image generating unit, wherein: the semantic decomposition unit is used for determining a first semantic image corresponding to the first image; the semantic conversion unit is used for determining a second semantic image according to the first semantic image and the target information; and the image generating unit is used for generating a second image according to the second semantic image, the target information and the first image.

In an exemplary embodiment of the present disclosure, the image generation apparatus further includes a semantic image storage unit, wherein: and the semantic image storage unit is used for storing the first semantic image and the second semantic image as a corresponding relation.

In an exemplary embodiment of the present disclosure, the image generation apparatus further includes a semantic image group determination unit and a semantic conversion model training unit, wherein: a semantic image group determining unit configured to determine a plurality of semantic image groups; each semantic image group comprises at least two semantic images; in the semantic image group, the corresponding posture information of each semantic image is different, and the corresponding semantic information of each semantic image is the same; and the semantic conversion model training unit is used for training the semantic conversion model according to the plurality of semantic image groups.

In an exemplary embodiment of the present disclosure, the semantic image group determining unit determines the plurality of semantic image groups in a specific manner as follows: the semantic image group determining unit determines semantic images corresponding to the images in each image set; the semantic image group determining unit matches the plurality of semantic images according to the semantic information corresponding to each semantic image to determine the plurality of semantic image groups.

In an exemplary embodiment of the disclosure, the manner in which the semantic conversion model training unit trains the semantic conversion model according to the plurality of semantic image groups is specifically: the semantic conversion model training unit determines a target semantic image group from a plurality of semantic image groups; the semantic conversion model training unit processes the target posture information and the first target semantic image through a semantic conversion model to generate a third image; the target semantic image group comprises a first target semantic image, and the target posture information corresponds to a second target semantic image in the target semantic image group; the semantic conversion model training unit determines a numerical value corresponding to the first loss function according to comparison of the third image and the second target semantic image; and the semantic conversion model training unit updates the semantic conversion model according to the value corresponding to the first loss function.

In an exemplary embodiment of the disclosure, the manner of determining the second semantic image according to the first semantic image and the target information by the semantic conversion unit is specifically: and the semantic conversion unit processes the first semantic image and the target information through the updated semantic conversion model to determine a second semantic image.

In an exemplary embodiment of the present disclosure, each image set includes an image and pose information to be generated corresponding to the image, and the image generation apparatus further includes: an image processing unit and a loss function determination unit, wherein:

the image processing unit is used for processing the target image in the target image set and the to-be-generated attitude information corresponding to the target image according to the semantic conversion model and the image generation model so as to generate a fourth image, and the fourth image corresponds to the to-be-generated attitude information corresponding to the target image; wherein the plurality of image sets comprise a target image set; the image processing unit is further used for processing the fourth image and the posture information corresponding to the target image according to the semantic conversion model and the image generation model to generate a fifth image, and the fifth image corresponds to the posture information of the target image; the loss function determining unit is used for determining a numerical value corresponding to the second loss function according to comparison of the target image and the fifth image; and the image generation unit is also used for updating the semantic conversion model and the image generation model according to the numerical value corresponding to the second loss function.

In an exemplary embodiment of the disclosure, the manner in which the image generating unit generates the second image according to the second semantic image, the target information, and the first image is specifically: the image generation unit processes the second semantic image, the target information, and the first image through the updated image generation model to generate a second image.

According to a third aspect of the present disclosure, there is provided a computer readable medium, on which a computer program is stored, which program, when executed by a processor, implements the image generation method as described in the first aspect of the embodiments above.

According to a fourth aspect of the present disclosure, there is provided an electronic device comprising: one or more processors; storage means for storing one or more programs which, when executed by one or more processors, cause the one or more processors to carry out the image generation method as described in the first aspect of the embodiments above.

The technical scheme provided by the disclosure can comprise the following beneficial effects:

in the technical solution provided by the embodiment of the present disclosure, a terminal device or a server may determine a first semantic image corresponding to a first image (e.g., an original person image), where the first semantic image may represent a human body structure in the first image; further, a second semantic image is determined according to the first semantic image and the target information (such as the pose information of the person in the image to be generated), for example, if the pose of the person in the first image is a forward standing state, the target information may be a side standing information, and the second semantic image is used for representing the human body structure when the person stands side; further, a second image may be generated from the second semantic image, the target information, and the first image, for example, in which the person is in a lateral standing position. According to the scheme, on one hand, the problem of poor image generation effect can be overcome to a certain extent, and the image generation effect is improved; on the other hand, the modeling of the mapping relation between the input image and the output image can be simplified through a semantic conversion process and an image generation process, and the human body structure reconstruction effect in the process of generating the character image can be improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure. It is to be understood that the drawings in the following description are merely exemplary of the disclosure, and that other drawings may be derived from those drawings by one of ordinary skill in the art without the exercise of inventive faculty. In the drawings:

FIG. 1 shows a flow diagram of an image generation method according to an exemplary embodiment of the present disclosure;

FIG. 2 illustrates a flow diagram for training a semantic conversion model from a plurality of sets of semantic images according to an exemplary embodiment of the present disclosure;

FIG. 3 illustrates a flow diagram for jointly training a semantic conversion model and an image generation model according to an exemplary embodiment of the present disclosure;

FIG. 4 is a schematic diagram illustrating a generation process corresponding to an image generation method according to an exemplary embodiment of the present disclosure;

FIG. 5 shows a schematic diagram of an application of an image generation method according to an exemplary embodiment of the present disclosure;

FIG. 6 shows a block diagram of an image generation apparatus according to an exemplary embodiment of the present disclosure;

FIG. 7 illustrates a schematic structural diagram of a computer system suitable for use with the electronic device used to implement an exemplary embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.

Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the disclosure. One skilled in the relevant art will recognize, however, that the subject matter of the present disclosure can be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations, or operations have not been shown or described in detail to avoid obscuring aspects of the disclosure.

The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.

The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

Referring to fig. 1, fig. 1 shows a flowchart of an image generation method according to an exemplary embodiment of the present disclosure, which may be implemented by a server or a terminal device.

As shown in fig. 1, an image generation method according to an embodiment of the present disclosure includes steps S110, S120, and S130, in which:

step S110: a first semantic image corresponding to the first image is determined.

Step S120: and determining a second semantic image according to the first semantic image and the target information.

Step S130: and generating a second image according to the second semantic image, the target information and the first image.

The following describes the steps in detail:

in step S110, the first image may include personal information, such as human body structure information, personal face information, clothing texture information, and posture information. The first semantic image corresponding to the first image may be used to represent the human body structure of the person in the first image, for example, the head, arms, legs, torso, and hair of the person in the first image are respectively represented by different regions in the first semantic image, and the proportion and the position of each part of the human body structure in the first image are the same as the proportion and the position in the first semantic image.

In an exemplary embodiment of the present disclosure, optionally, the manner of determining the first semantic image corresponding to the first image may specifically be: and carrying out semantic decomposition on the first image through a semantic decomposition model to obtain a first semantic image corresponding to the first image. The training data set of the semantic decomposition model can comprise at least one of DeepFashinon, Market-1501, Pascal VOC 2012, Cityscapes, Pascal Context and Stanford Background Dataset; DeepFashinon is a clothing data set, Market-1501 is a person data set, Pascal VOC 2012 is a data set including 20 categories of persons, vehicles, and the like, Cityscapes is a landscape data set, and Pascal Context and Stanford Background data are indoor and outdoor scene data sets. For example, if the first image is a human image, the training data set of the semantic decomposition model may be Pascal VOC 2012, and the semantic decomposition model may be a human semantic decomposition model; if the first image is an article image, the training data set of the semantic decomposition model can be Pascal VOC 2012, and the semantic decomposition model can be an article semantic decomposition model; if the first image is a landscape image, the training data set of the semantic decomposition model may be cityscaps, and the semantic decomposition model may be a landscape semantic decomposition model.

In addition, since an image is composed of pixels, semantic decomposition is used to indicate that pixels are grouped and divided according to differences in semantic meanings expressed in the image. The goal of semantic segmentation of images is to assign a semantic label to each pixel in the image, where the semantic label typically includes a different range of object classes (e.g., people, dogs, buses, and bicycles) and background components (e.g., sky, roads, buildings, and mountains). The result of semantic segmentation is a class segmentation mask corresponding to each pixel in the predicted image, and the image content can be more comprehensively described compared with the image-level class labels obtained by image classification and the object frames predicted by object detection. In addition, optionally, the above semantic decomposition of the first image by the semantic decomposition model may specifically be: the first image is semantically decomposed by a semantic decomposition model and according to a full convolutional neural network (FCN) or a deep Convolutional Neural Network (CNN).

In step S120, the target information may include pose information in the second image. The target information may be image information, and the image may be a feature image including feature points corresponding to the pose. The pose information in the target information is pose information in the image to be generated (i.e., the second image) and is different from the pose information in the first image, for example, if the first image is a human image and the pose information in the first image is a human forward standing pose, the target information may be other poses besides the human forward standing pose, such as a human side standing pose, and the pose information in the generated second image includes a human side standing pose; if the first image is an article image, and the posture information in the first image is the article forward placing posture, the target information can be other postures except the article forward placing posture, such as an article side-to-side placing posture, and the generated posture information in the second image comprises the article side-to-side placing posture; if the first image is a landscape image and the pose information in the first image is a forward shooting pose, the target information may be a pose other than the forward shooting pose, such as a sideways shooting pose, and the pose information in the generated second image includes the sideways shooting pose.

In step S130, the second image and the first image may be two images having different pose information but the same semantic information, wherein the semantic information may include semantic tags and image attributes. In addition, a second semantic image corresponding to the second image may be used to represent the human anatomy of the person in the second image.

In an exemplary embodiment of the present disclosure, optionally, a manner of generating the second image according to the second semantic image, the target information, and the first image may specifically be: and rendering the second semantic image according to the target information and the first image to obtain a second image. Wherein the image attributes in the second image are the same as the image attributes in the first image. For example, if the first image is a human image, the image attributes such as human body structure and clothes texture in the second image are generated to be the same as those of the first image.

In an exemplary embodiment of the present disclosure, optionally, the image generating method may further include the steps of: determining the similarity between a second image and an original image, wherein the posture information and the semantic information of the second image are the same as those of the original image; and if the similarity is higher than the preset similarity, judging that the image generation is successful, and if the similarity is lower than the preset similarity, judging that the image generation is failed.

In addition, an image generation method according to the present disclosure may be applied to generation of a human image, or may be applied to generation of an animal image, a plant image, or an article image, and an embodiment of the present disclosure is not limited thereto.

Therefore, by implementing the image generation method shown in fig. 1, the problem of poor image generation effect can be overcome to a certain extent, and the image generation effect is improved; and the modeling of the mapping relation between the first image and the second image can be simplified through a semantic conversion process and an image generation process, so that the human body structure reconstruction effect when the person image is generated is improved.

As an exemplary embodiment, the image generating method may further include the steps of: and storing the first semantic image and the second semantic image as a corresponding relation.

In an exemplary embodiment of the present disclosure, the semantic information in the first semantic image is the same as the semantic information in the second image.

Therefore, by implementing the exemplary embodiment, the first semantic image and the second semantic image can be stored as the corresponding relation, so that the same image can be conveniently called when being decomposed, and the image generation efficiency is further improved; and the semantic conversion model used for semantic conversion can be trained as a training sample so as to improve the semantic conversion efficiency and the semantic conversion effect.

As another exemplary embodiment, the image generation method may further include the steps of: determining a plurality of semantic image groups; each semantic image group comprises at least two semantic images; in the semantic image group, the corresponding posture information of each semantic image is different, and the corresponding semantic information of each semantic image is the same; and training a semantic conversion model according to the plurality of semantic image groups.

In an exemplary embodiment of the present disclosure, if each semantic image group includes two semantic images, the expression corresponding to the plurality of semantic image groups may be:

wherein the content of the first and second substances,

and

can be a semantic graph with different posture information,

subscript of

For representing semantic graphs

The posture of (a) of (b),

subscript P of_sFor representing semantic graphs

The posture of (2). Each semantic image group comprises at least two semantic images

And

when in use

And

when the superscripts i in (1) take the same numerical value,

and

the corresponding posture information is different from each other,

and

the corresponding semantic information is the same.

In an exemplary embodiment of the present disclosure, optionally, after training the semantic conversion model according to the plurality of semantic image groups, determining the second semantic image according to the first semantic image and the target information includes: and processing the first semantic image and the target information through the trained semantic conversion model to determine a second semantic image.

Therefore, by implementing the exemplary embodiment, the semantic conversion model can be trained through the determined multiple semantic image groups, and the effect and efficiency of image semantic conversion are further improved.

As another exemplary embodiment, the above-mentioned determining a plurality of semantic image groups may include the following steps: determining semantic images corresponding to the images in each image set; and matching the plurality of semantic images according to the semantic information corresponding to each semantic image to determine a plurality of semantic image groups.

In an exemplary embodiment of the present disclosure, an image set may include an image and target information, where the target information is used to represent pose information in another image generated from the image. The image set is multiple, and the image set may be a data set, and the expression corresponding to the image set may be:

wherein the content of the first and second substances,

it may be an image or a video,

subscript P of_sFor representing pose information in an image,

may be based on an image

Pose information in the generated further image.

In an exemplary embodiment of the present disclosure, after determining the semantic image corresponding to the image in each image set, the obtained expression corresponding to a plurality of semantic images may be:

therefore, by implementing the exemplary embodiment, a plurality of semantic image groups can be determined by matching the semantic graphs corresponding to the images in the image set, so that the training effect on the semantic conversion model is improved, and the semantic conversion efficiency is further improved.

Referring to fig. 2 as yet another exemplary embodiment, fig. 2 illustrates a flowchart of training a semantic conversion model according to a plurality of semantic image groups according to an exemplary embodiment of the disclosure. As shown in fig. 2, training the semantic conversion model according to the plurality of semantic image groups may include step S210, step S220, step S230, and step S240, wherein:

step S210: and determining a target semantic image group from the plurality of semantic image groups.

Step S220: processing the target posture information and the first target semantic image through a semantic conversion model to generate a third image; the target semantic image group comprises a first target semantic image, and the target posture information corresponds to a second target semantic image in the target semantic image group.

Step S230: and determining a numerical value corresponding to the first loss function according to the comparison of the third image and the second target semantic image.

Step S240: and updating the semantic conversion model according to the value corresponding to the first loss function.

In an exemplary embodiment of the present disclosure, optionally, the manner of determining the target semantic image group from the plurality of semantic image groups may specifically be: randomly determining a semantic image group from a plurality of semantic image groups as a target semantic image group; or, a preset semantic image group is determined from the plurality of semantic image groups and is used as the target semantic image group. Further, the target pose information may be pose information of a second target semantic image in the set of target semantic images, the second target semantic image being different from the pose information of the first target semantic image but having the same semantic information. In addition, the third image may be a semantic image, and the second target semantic image is obtained by directly converting the original image corresponding to the second target semantic image by using a semantic conversion model.

In an exemplary embodiment of the present disclosure, the expression corresponding to the target semantic image group may be:

wherein the content of the first and second substances,

is from

Determined, i.e. from a plurality of semantic image groups as described aboveAnd determining a target semantic image group. In addition, the first and second substrates are,

may be a first target semantic image,

may be a second target semantic image,

can be used as target attitude information through semantic conversion model (H)_s) To pair

And

can be processed to obtain

Can be a third image, by comparison

And

a value corresponding to the first loss function can be determined.

In an exemplary embodiment of the present disclosure, optionally, the manner of updating the semantic conversion model according to the value corresponding to the first loss function may specifically be: and determining the gradient of the network parameters according to the numerical value corresponding to the first loss function, and updating the network parameters of the semantic conversion model according to a back propagation algorithm.

In an exemplary embodiment of the present disclosure, optionally, the following steps are further included: circularly executing the step S210 to the step S240 until the semantic conversion model converges, and further judging that the training of the semantic conversion model is finished; here, the semantic conversion model convergence may also be understood as the first loss function convergence.

In an exemplary embodiment of the present disclosure, the first pair of impairment resistant terms and the cross-entropy impairment term may be included in the first loss function. In particular, the first loss function L_SMay be:

wherein the content of the first and second substances,

may be the first pair of loss-tolerance terms described above,

may be:

DS in the above expression may be a discriminator of a semantic conversion model, specifically, L^adv(G,D,X,Y)＝E_X[logD(X)]+E_Y[log(1-D(Y))]G may be H as described above_SD may be D as described above_SX may be as defined above

Y may be as defined above

May be the cross entropy loss term described above,

may be:

therefore, by implementing the exemplary embodiment shown in fig. 2, the network parameters in the semantic conversion model can be modified through the loss function, so that the semantic conversion effect on the image is improved.

As still another exemplary embodiment, the above-mentioned determining the second semantic image according to the first semantic image and the object information may include the steps of: and processing the first semantic image and the target information through the updated semantic conversion model to determine a second semantic image.

As still another exemplary embodiment, each image set may include an image and pose information to be generated corresponding to the image. Referring to fig. 3, fig. 3 is a schematic flow chart illustrating the joint training of the semantic conversion model and the image generation model according to an exemplary embodiment of the disclosure. As shown in fig. 3, the jointly training semantic conversion model and the image generation model may include step S310, step S320, step S330, and step S340, wherein:

step S310: processing the target image in the target image set and the posture information to be generated corresponding to the target image according to the semantic conversion model and the image generation model to generate a fourth image, wherein the fourth image corresponds to the posture information to be generated corresponding to the target image; wherein the plurality of image sets includes a target image set.

Step S320: and processing the posture information corresponding to the fourth image and the target image according to the semantic conversion model and the image generation model to generate a fifth image, wherein the fifth image corresponds to the posture information of the target image.

Step S330: and determining a numerical value corresponding to the second loss function according to the comparison of the target image and the fifth image.

Step S340: and updating the semantic conversion model and the image generation model according to the numerical value corresponding to the second loss function.

In an exemplary embodiment of the present disclosure, the expression corresponding to the target image set may be:

wherein, the image set

Comprises

May be a target image, p_tThe pose information to be generated corresponding to the target image can be obtained. Optionally, before step S310, the terminal device or the server may analyze the image through a semantic decomposition model

Decomposing to obtain a sum image

Corresponding semantic graph

Further, through a semantic conversion model (H)_s) To pair

And p_tProcessing to obtain semantic graph

Generation of a model from an image (H)_A) To pair

p_tAnd

is processed to obtain

May be the fourth image described above. Further, through a semantic conversion model (H)_s) To pair

And p_sCan obtain a semantic graph

Further, a model is generated from the image (H)_A) To pair

p_sAnd

can be treated to obtain

May be the fifth image described above. By pairs

And

the corresponding value of the second loss function can be determined by the comparison.

In an exemplary embodiment of the present disclosure, optionally, the manner of updating the semantic conversion model according to the value corresponding to the first loss function may specifically be: and determining the gradient of the network parameters according to the numerical value corresponding to the second loss function, and updating the network parameters of the semantic conversion model and the image generation model according to a back propagation algorithm.

In an exemplary embodiment of the present disclosure, optionally, the following steps are further included: circularly executing the step S310 to the step S340 until the semantic conversion model and the image generation model are converged, and further judging that the joint training of the semantic conversion model and the image generation model is finished; here, the convergence of the semantic conversion model and the image generation model may also be understood as the convergence of the second loss function.

In an exemplary embodiment of the present disclosure, the second loss function may include a second confrontation loss term, a pose loss term, a content consistency loss term, a semantically guided style loss term, and a face loss term, and specifically, the second loss function may correspond to an expression:

wherein the content of the first and second substances,

the second pair of loss tolerance terms may be as described above, and the corresponding expression may be:

the above-mentioned attitude loss term may be represented by the following expression:

Φ (-) can be a pose extractor;

the content consistency loss term may be as described above, and the corresponding expression may be:

Λ (·) may be an image feature extractor, such as a layer of features resulting from inputting an image into a VGG network;

can be the semantic quoteAnd (3) leading style loss terms, wherein the corresponding expression can be as follows:

psi (-) can be a semantic feature extractor for extracting image VGG features under corresponding semantic information;

the face loss term may be as described above, and the corresponding expression may be:

DF is a face discriminator, F (-) is a face extractor, it should be noted that if the image is a human image, the face discriminator can discriminate the human face, the face extractor can extract the human face, if the image is an animal image, the face discriminator can discriminate the animal face, and the face extractor can extract the animal face.

In addition, it should be noted that VGG networks (Visual Geometry Group networks) prove that increasing the depth of the Network can affect the Network performance to some extent. The VGG uses 3 convolution kernels 3x3 instead of 7x7 and 2 convolution kernels 3x3 instead of 5 × 5 to enhance the depth of the network and to some extent the effect of the neural network under the condition of having the same perception field.

It should be noted that the first and second expressions in the first and second pairs of loss tolerance terms are only used for distinguishing two different pairs of loss tolerance terms, and do not constitute a distinction in level or priority for the two pairs of loss tolerance terms.

It can be seen that, by implementing the exemplary embodiment shown in fig. 3, the semantic conversion model and the image generation model can be continuously trained through the value of the loss function, that is, network parameters in the semantic conversion model and the image generation model are updated, so that the texture rendering effect of the image generation model on the semantic image is improved; and the image generation effect and the image generation efficiency can be improved through the joint training of the semantic conversion model and the image generation model.

As still another exemplary embodiment, the generating the second image according to the second semantic image, the object information and the first image as described above may include the steps of: and processing the second semantic image, the target information and the first image through the updated image generation model to generate a second image.

Referring to fig. 4, fig. 4 is a schematic diagram illustrating a generation process corresponding to an image generation method according to an exemplary embodiment of the present disclosure, which can be understood as an application of the image generation method of the present disclosure in generating a human image. As shown in fig. 4, the generation process includes: a first image 401, a human semantic decomposer 403, a first semantic image 405, a human semantic converter 407, a second semantic image 409, a target pose 411, a human appearance generator 413 and a second image 415; the human semantic decomposer 403 may be understood as a semantic decomposition model in the embodiment of the present disclosure, the human semantic converter 407 may be understood as a semantic conversion model in the embodiment of the present disclosure, the target pose 411 may be understood as a pose corresponding to pose information in the second image in the embodiment of the present disclosure, and the human appearance generator 413 may be understood as an image generation model in the embodiment of the present disclosure.

Specifically, when the terminal device or the server detects the first image 401, the first image 401 may be subjected to semantic decomposition through the human body semantic decomposer 403 to obtain a first semantic image 405, which is equivalent to determining the first semantic image corresponding to the first image in the embodiment of the present disclosure; furthermore, the human body semantic converter 407 processes the first semantic image 405 and the target pose 411 to obtain a second semantic image 409, which is equivalent to determining the second semantic image according to the first semantic image and the target information in the embodiment of the present disclosure; further, by processing the target pose 411, the second semantic image 409, and the first image 401 by the human appearance generator 413, a second image 415 may be obtained, which corresponds to generating the second image from the second semantic image, the target information, and the first image in the embodiments of the present disclosure.

Therefore, the generation process corresponding to the image generation method shown in fig. 4 can overcome the problem of poor image generation effect to a certain extent, and further improve the image generation effect; and the modeling of the mapping relation between the first image and the second image can be simplified through a semantic conversion process and an image generation process, so that the human body structure reconstruction effect when the person image is generated is improved.

Referring to fig. 5, fig. 5 is a schematic diagram illustrating an application of an image generation method according to an exemplary embodiment of the present disclosure. As shown in fig. 5, the application diagram of the image generation method includes a first image 501, second pose information 502, a human semantic decomposer 503, a first semantic image 504, a human semantic converter 505, a second image 506, a human appearance generator 507, a second semantic image 508, first pose information 509, a first image '510, and a first semantic image' 511.

The human semantic decomposer 503 may be understood as a semantic decomposition model in the embodiment of the present disclosure, the human semantic converter 505 may be understood as a semantic conversion model in the embodiment of the present disclosure, and the human appearance generator 507 may be understood as an image generation model in the embodiment of the present disclosure. The first image 501 may be as described above

The second image 506 may be as described above

The first semantic image 504 may be as described above

The second semantic image 508 may be as described above

The first image' 510 may be as described above

The first semantic image' 511 may be as described above

Specifically, when the terminal device or the server detects the first image 501, the first image 501 may be decomposed into a first semantic image 504 by a human semantic decomposer 503; furthermore, the first semantic image 504 and the second pose information 502 can obtain a second semantic image 508 through a human semantic converter 505; further, the second semantic image 508, the second pose information 502, and the first image 501 can be used to obtain a second image 506 by the human appearance generator 507. In addition, after the second semantic image 508 and the first pose information 509 pass through the human body semantic converter 505, the first semantic image' 511 can be obtained; further, the first semantic image '511, the second image 506, and the first pose information 509 can obtain a first image' 510 by the human appearance generator 507. Therefore, the image generation method of the present disclosure can not only generate a person image in one posture from a person image in another posture, but also restore the person image in one posture from a person image in another posture.

In terms of data, the image generation method may be an unsupervised method; in the aspect of human body image generation, a human body semantic structure is introduced into an algorithm framework of the image generation method, and the human body structure can be decomposed, so that a more complete human body structure is constructed, more complete clothes attributes are kept, and the effect of improving image generation is achieved.

Therefore, the application schematic diagram of the image generation method shown in fig. 5 can overcome the problem of poor image generation effect to a certain extent, and further improve the image generation effect; and the modeling of the mapping relation between the first image and the second image can be simplified through a semantic conversion process and an image generation process, so that the human body structure reconstruction effect when the person image is generated is improved.

Referring to fig. 6, fig. 6 is a block diagram illustrating a structure of an image generating apparatus according to an exemplary embodiment of the present disclosure. The image generation apparatus includes: a semantic decomposition unit 601, a semantic conversion unit 602, and an image generation unit 603, wherein:

a semantic decomposition unit 601, configured to determine a first semantic image corresponding to the first image; a semantic conversion unit 602, configured to determine a second semantic image according to the first semantic image and the target information; an image generation unit 603 configured to generate a second image from the second semantic image, the object information, and the first image.

Therefore, the image generation device shown in fig. 6 can overcome the problem of poor image generation effect to a certain extent, and further improve the image generation effect; and the modeling of the mapping relation between the first image and the second image can be simplified through a semantic conversion process and an image generation process, so that the human body structure reconstruction effect when the person image is generated is improved.

As an exemplary embodiment, the target information includes pose information in the second image.

As another exemplary embodiment, the image generation apparatus may further include a semantic image storage unit (not shown), wherein: and the semantic image storage unit is used for storing the first semantic image and the second semantic image as a corresponding relation.

As still another exemplary embodiment, the image generating apparatus may further include a semantic image group determining unit (not shown) and a semantic conversion model training unit (not shown), wherein: a semantic image group determining unit configured to determine a plurality of semantic image groups; each semantic image group comprises at least two semantic images; in the semantic image group, the corresponding posture information of each semantic image is different, and the corresponding semantic information of each semantic image is the same; and the semantic conversion model training unit is used for training the semantic conversion model according to the plurality of semantic image groups.

As another exemplary embodiment, the manner of determining the plurality of semantic image groups by the semantic image group determining unit may specifically be: the semantic image group determining unit determines semantic images corresponding to the images in each image set; the semantic image group determining unit matches the plurality of semantic images according to the semantic information corresponding to each semantic image to determine the plurality of semantic image groups.

As another exemplary embodiment, the way in which the semantic conversion model training unit trains the semantic conversion model according to the plurality of semantic image groups may specifically be: the semantic conversion model training unit determines a target semantic image group from a plurality of semantic image groups; the semantic conversion model training unit processes the target posture information and the first target semantic image through a semantic conversion model to generate a third image; the target semantic image group comprises a first target semantic image, and the target posture information corresponds to a second target semantic image in the target semantic image group; the semantic conversion model training unit determines a numerical value corresponding to the first loss function according to comparison of the third image and the second target semantic image; and the semantic conversion model training unit updates the semantic conversion model according to the value corresponding to the first loss function.

Therefore, by implementing the exemplary embodiment, the network parameters in the semantic conversion model can be modified through the loss function, and the semantic conversion effect on the image is improved.

As another exemplary embodiment, the manner of determining the second semantic image according to the first semantic image and the target information by the semantic conversion unit 602 is specifically: the semantic conversion unit 602 processes the first semantic image and the target information through the updated semantic conversion model to determine a second semantic image.

As still another exemplary embodiment, each image set includes an image and pose information to be generated corresponding to the image, and the image generation apparatus may further include: an image processing unit (not shown) and a loss function determination unit (not shown), wherein:

the image processing unit is used for processing the target image in the target image set and the to-be-generated attitude information corresponding to the target image according to the semantic conversion model and the image generation model so as to generate a fourth image, and the fourth image corresponds to the to-be-generated attitude information corresponding to the target image; wherein the plurality of image sets comprise a target image set; the image processing unit is further used for processing the fourth image and the posture information corresponding to the target image according to the semantic conversion model and the image generation model to generate a fifth image, and the fifth image corresponds to the posture information of the target image; the loss function determining unit is used for determining a numerical value corresponding to the second loss function according to comparison of the target image and the fifth image; the image generating unit 603 is further configured to update the semantic conversion model and the image generating model according to the value corresponding to the second loss function.

Therefore, by implementing the exemplary embodiment, the semantic conversion model and the image generation model can be continuously trained through the numerical value of the loss function, that is, the network parameters in the semantic conversion model and the image generation model are updated, so that the texture rendering effect of the image generation model on the semantic image is improved; and the image generation effect and the image generation efficiency can be improved through the joint training of the semantic conversion model and the image generation model.

As another exemplary embodiment, the manner in which the image generating unit 603 generates the second image according to the second semantic image, the target information, and the first image may specifically be: the image generation unit 603 processes the second semantic image, the object information, and the first image by the updated image generation model to generate a second image.

As yet another exemplary embodiment, the first loss function includes a first pair of impairment immunity terms and a cross-entropy loss term.

As yet another exemplary embodiment, the second loss function includes a second pair of impairment resistant terms, a pose loss term, a content consistency loss term, a semantically guided style loss term, and a face loss term.

For details which are not disclosed in the embodiments of the apparatus of the present disclosure, please refer to the embodiments of the image generation method described above in the present disclosure for the details which are not disclosed in the embodiments of the apparatus of the present disclosure.

Referring to FIG. 7, FIG. 7 illustrates a block diagram of a computer system 700 suitable for use with the electronic device of an exemplary embodiment of the present disclosure. The computer system 700 of the electronic device shown in fig. 7 is only an example, and should not bring any limitations to the function and scope of use of the embodiments of the present disclosure.

As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for system operation are also stored. The CPU701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program executes the above-described functions defined in the system of the present application when executed by the Central Processing Unit (CPU) 701.

It should be noted that the computer readable media shown in the present disclosure may be computer readable signal media or computer readable storage media or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software, or may be implemented by hardware, and the described units may also be disposed in a processor. Wherein the names of the elements do not in some way constitute a limitation on the elements themselves.

As another aspect, the present application also provides a computer-readable medium, which may be contained in the electronic device described in the above embodiments; or may exist separately without being assembled into the electronic device. The computer-readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to implement the image generation method as described in the above embodiments.

For example, the electronic device may implement the following as shown in fig. 1: step S110: determining a first semantic image corresponding to the first image; step S120: determining a second semantic image according to the first semantic image and the target information; step S130: and generating a second image according to the second semantic image, the target information and the first image.

It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.

Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, a touch terminal, or a network device, etc.) to execute the method according to the embodiments of the present disclosure.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. An image generation method, comprising:

determining a first semantic image corresponding to the first image;

determining a second semantic image according to the first semantic image and the target information;

and generating a second image according to the second semantic image, the target information and the first image.

2. The method of claim 1, wherein the target information comprises pose information in the second image.

3. The method of claim 1, further comprising:

and storing the first semantic image and the second semantic image as a corresponding relation.

4. The method of claim 1, further comprising:

determining a plurality of semantic image groups; each semantic image group comprises at least two semantic images; in the semantic image group, the corresponding posture information of each semantic image is different, and the corresponding semantic information of each semantic image is the same;

and training a semantic conversion model according to the plurality of semantic image groups.

5. The method of claim 4, wherein determining a plurality of sets of semantic images comprises:

determining semantic images corresponding to the images in each image set;

and matching the semantic images according to the semantic information corresponding to each semantic image to determine a plurality of semantic image groups.

6. The method of claim 4, wherein training a semantic conversion model from a plurality of the semantic image sets comprises:

determining a target semantic image group from a plurality of semantic image groups;

processing the target posture information and the first target semantic image through a semantic conversion model to generate a third image; the target semantic image group comprises a first target semantic image, and the target posture information corresponds to a second target semantic image in the target semantic image group;

determining a numerical value corresponding to a first loss function according to the comparison between the third image and the second target semantic image;

and updating the semantic conversion model according to the value corresponding to the first loss function.

7. The method of claim 6, wherein determining a second semantic image from the first semantic image and target information comprises:

and processing the first semantic image and the target information through the updated semantic conversion model to determine a second semantic image.

8. The method of claim 5, wherein each of the image sets includes the image and pose information to be generated corresponding to the image, the method further comprising:

processing a target image in a target image set and attitude information to be generated corresponding to the target image according to the semantic conversion model and the image generation model to generate a fourth image, wherein the fourth image corresponds to the attitude information to be generated corresponding to the target image; wherein the plurality of image sets includes the target image set;

processing the pose information corresponding to the fourth image and the target image according to the semantic conversion model and the image generation model to generate a fifth image, wherein the fifth image corresponds to the pose information of the target image;

determining a numerical value corresponding to a second loss function according to the comparison of the target image and the fifth image;

and updating the semantic conversion model and the image generation model according to the numerical value corresponding to the second loss function.

9. The method of claim 8, wherein generating a second image from the second semantic image, the target information, and the first image comprises:

and processing the second semantic image, the target information and the first image through the updated image generation model to generate a second image.

10. The method of claim 6, wherein the first loss function includes a first pair of impairment immunity terms and a cross-entropy loss term.

11. The method of claim 8, wherein the second loss function comprises a second pair of impairment resistant terms, a pose loss term, a content consistency loss term, a semantically guided style loss term, and a face loss term.

12. An image generation apparatus, comprising:

the semantic decomposition unit is used for determining a first semantic image corresponding to the first image;

the semantic conversion unit is used for determining a second semantic image according to the first semantic image and the target information;

an image generating unit configured to generate a second image according to the second semantic image, the target information, and the first image.

13. A computer-readable medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out an image generation method according to any one of claims 1 to 11.

14. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs which, when executed by the one or more processors, cause the one or more processors to carry out an image generation method as claimed in any one of claims 1 to 11.