CN112381927A

CN112381927A - Image generation method, device, equipment and storage medium

Info

Publication number: CN112381927A
Application number: CN202011301138.7A
Authority: CN
Inventors: 杨新航; 陈睿智
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-11-19
Filing date: 2020-11-19
Publication date: 2021-02-19

Abstract

The application discloses a method, a device, equipment and a storage medium for generating images, and relates to the fields of deep learning, computer vision, cloud computing and the like. The specific implementation scheme is as follows: determining a three-dimensional model according to a target person contained in the received image; adjusting the three-dimensional model according to the characteristics of the target person to obtain an adjusted three-dimensional model; and fusing the adjusted three-dimensional model with a target person contained in the image to generate a fused image. On one hand, the similarity between the generated image and the target user can be improved. On the other hand, since the generated image is not the original face image of the target user, protection of privacy of the target user's face can be achieved.

Description

Image generation method, device, equipment and storage medium

Technical Field

The present application relates to the field of image processing, and in particular, to the fields of deep learning, computer vision, cloud computing, and the like.

Background

The cartoon head portrait generated by the photo of the real person is more and more popular to the public. The conventional technology comprises the steps of uploading images by a user, and replacing faces in the uploaded images into a cartoon head portrait template. The scheme has poor fusion.

And the subsequent development is that the face is subjected to cartoon deformation by using the generated countermeasure network to obtain a cartoon head portrait. But the distortion degree of the scheme is high, and the similarity with the face of the user is low.

Disclosure of Invention

The application provides a method, a device, equipment and a storage medium for generating an image.

According to an aspect of the present application, there is provided a method of image generation, which may include the steps of:

determining a three-dimensional model according to a target person contained in the received image;

adjusting the three-dimensional model according to the characteristics of the target person to obtain an adjusted three-dimensional model;

and fusing the adjusted three-dimensional model with a target person contained in the image to generate a fused image.

According to another aspect of the present application, there is provided an apparatus for image generation, which may include:

a three-dimensional model determination module for determining a three-dimensional model according to a target person included in the received image;

the three-dimensional model adjusting module is used for adjusting the three-dimensional model according to the characteristics of the target person to obtain an adjusted three-dimensional model;

and the image fusion module is used for fusing the adjusted three-dimensional model with a target person contained in the image to generate a fused image.

In a third aspect, an embodiment of the present application provides an electronic device, including:

at least one processor; and a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to cause the at least one processor to perform a method provided by any one of the embodiments of the present application.

In a fourth aspect, embodiments of the present application provide a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a method provided by any one of the embodiments of the present application.

According to the technology of the application, the similarity between the generated image and the target user can be improved. On the other hand, since the generated image is not the original face image of the target user, protection of privacy of the target user's face can be achieved.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present application, nor do they limit the scope of the present application. Other features of the present application will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a flow chart of a method of image generation according to the present application;

FIG. 2 is a schematic diagram of a three-dimensional model according to the present application;

FIG. 3 is a flow chart of tuning a three-dimensional model according to the present application;

FIG. 4 is a schematic illustration of adjusting an expression of a three-dimensional model according to the present application;

FIG. 5 is a flow chart of tuning a three-dimensional model according to the present application;

FIG. 6 is a flow chart for fusing images according to the present application;

FIG. 7 is a flow chart for fusing images according to the present application;

FIG. 8 is a schematic illustration of fusing images according to the present application;

FIG. 9 is a flow chart for fusing images according to the present application;

FIG. 10 is a schematic diagram of an apparatus for image generation according to the present application;

fig. 11 is a block diagram of an electronic device for implementing the method of image generation according to the embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

The embodiment of the application provides an image generation method. As shown in fig. 1, the method may include the steps of:

s101: determining a three-dimensional model according to a target person contained in the received image;

s102: adjusting the three-dimensional model according to the characteristics of the target person to obtain an adjusted three-dimensional model;

s103: and fusing the adjusted three-dimensional model with a target person contained in the image to generate a fused image.

The execution main body of the method can be an intelligent terminal such as a mobile phone, and can also be a cloud or a server in remote communication with the intelligent terminal. The following description will take the execution subject as the cloud as an example.

The user can shoot the image containing the target person through a mobile phone and other intelligent terminals. In the case where a plurality of persons are included in the image, at least one target person thereof may be specified by the user.

And uploading the image to the cloud end by the intelligent terminal. Referring to fig. 2, a plurality of three-dimensional models may be prestored in the cloud. The three-dimensional model can be constructed in advance according to different factors such as age, gender, facial form and the like.

The cloud end can determine the three-dimensional model with the highest matching degree with the target person by analyzing the target person in the image. The analysis may be performed according to various factors such as age, sex, face shape, etc. of the target person. The specific analysis mode can be realized by using a trained neural network model for judging the three-dimensional model. The image including the target person is input to a neural network model for three-dimensional model determination, and a three-dimensional model with the highest degree of matching with the target person can be selected.

After the three-dimensional model matching the target person is selected, the three-dimensional model may be adjusted to bring the three-dimensional model closer to the target person in the image.

The adjustment may be based on the characteristics of the target person in the image. The features may include, among other things, the facial shape, facial features, facial expressions, skin tone, etc. of the target character.

For example, the three-dimensional model's face may be fine-tuned laterally, longitudinally, or locally to approximate the three-dimensional model's face to the target character.

For another example, the shape and position of the five sense organs of the three-dimensional model may be adjusted, such as the shape of the eyes, the position of the eyes on the face, and the relative positions of the eyes to the nose and eyebrows.

In addition, the three-dimensional model can be correspondingly adjusted according to the expression, skin color and the like of the target character.

The three-dimensional model adjusted in the above manner can approach the appearance of the target person. And fusing the adjusted three-dimensional model with a target person in the image to generate a fused image. For example, the fusion may include two-dimensional processing of the three-dimensional model to replace the target person in the image. For example, the fusion method may further include performing mask processing on the target person to combine the two-dimensionally processed image with a mask to generate a fused image.

It should be noted that, for training of the neural network model for three-dimensional model determination, a large number of face samples and corresponding three-dimensional model samples may be used. For example, face samples of different ages, sexes, and face shapes are grouped in advance as input, and a model is trained with an Identifier (ID) of a three-dimensional model corresponding to the age, sex, and face shape as output. And when the error of the output result of the neural network model for judging the three-dimensional model to the test sample and the error of the real result are within an allowable range, the training of the neural network model for judging the three-dimensional model is ended.

The above-described example of the present application is explained by taking as an example the fusion of the head region of the target person with the three-dimensional model. Actually, the present invention is not limited to this, and for example, image generation may be performed for the entire region of the target person. In addition, the positions of the five sense organs, the trunk and the like of the user in the fused image can be subjected to beautifying processing or cartoon processing, so that the interestingness of the fused image can be increased.

Through the scheme, on one hand, the similarity between the generated image and the target user can be improved. On the other hand, since the generated image is not the original face image of the target user, protection of privacy of the target user's face can be achieved.

In one embodiment, as shown in fig. 3, the adjusting the three-dimensional model according to the characteristics of the target person in step S102 may include the following sub-steps:

s1021: determining at least one of facial features and expressive features of the target person;

s1022: and adjusting the three-dimensional model according to the facial features and/or the expression features.

The facial features and the expression features of the target person can be embodied in a parameter form. For example, facial parameters and expression parameters may be used.

For the calculation of the face parameters and the expression parameters, the face parameter neural network model and the expression parameter neural network model can be used for calculation respectively.

The face parameter neural network model and the expression parameter neural network model can be trained in advance. Taking training of the expression parameter neural network model as an example for explanation, image samples of different expressions are obtained in advance, for example, smiling, laughing, frightening, and grimacing. Taking smile as an example, inputting an image sample of the smile expression into an input layer of an expression parameter neural network model to be trained, wherein the output layer can be the smile expression and the degree of the smile expression. And similarly, training the expression parameter neural network model to be trained by using image samples of other expressions.

And when the error between the output result of the expression parameter neural network model on the test sample and the real result is within an allowable range, the end of the training of the expression parameter neural network model is represented.

The training mode of the facial parameter neural network model is similar to that of the expression parameter neural network model. For example, image samples of different face shapes may be obtained in advance, and exemplary face shapes may be classified according to the shape of the face, and may include various types, such as a round face shape, an oval face shape, an inverted oval face shape, a square face shape, a rectangular face shape, a trapezoidal face shape, an inverted trapezoidal face shape, and a diamond face shape. And training the facial form parameter neural network model by using the facial form samples and the corresponding facial form classification.

Under the condition that the facial form characteristics and the expression characteristics (namely facial form parameters and expression parameters) of the target character are obtained by using the facial form parameter neural network model and the expression parameter neural network model. The three-dimensional model may be adjusted using the facial and expressive features of the target character.

In conjunction with fig. 4, taking the expression adjustment as an example, the expression on the left side in fig. 4 may be a diagram illustrating the result of adjusting the three-dimensional model by using the expression of "Du Bing make ghost face". The middle expression in fig. 4 may be a diagram illustrating the result of adjusting the three-dimensional model with the expression of "bad smile". The expression on the right side in fig. 4 may be a diagram illustrating the result of the adjustment of the three-dimensional model with a "surprised" expression.

Through the scheme, the three-dimensional model is adjusted in a coarse granularity mode through the face feature and/or the expression feature, and the three-dimensional model can be attached to the target character in the aspects of external contour, expression and the like.

As shown in fig. 5, in one embodiment, the adjusting the three-dimensional model according to the characteristics of the target person in step S102 may further include the following sub-steps: :

s1023: respectively determining the positions of the characteristic points of the target person and the positions of the characteristic points of the three-dimensional model;

s1024: and adjusting the positions of the feature points of the three-dimensional model according to the positions of the feature points of the target person, so that the error between the positions of the feature points of the adjusted three-dimensional model and the positions of the feature points of the target person is within an allowable range.

The feature points may be the outline of the five sense organs, the outline of the face, etc. of the target person, and the shape of the five sense organs, the relative positional relationship between the five sense organs and the face, etc. of the target person may be represented by the positions of the feature points.

And respectively determining the positions of the target person and the feature points of the three-dimensional model, and adjusting the positions of the feature points of the three-dimensional model according to the positions of the feature points of the target person. The result of the adjustment is adjusted to satisfy that the error between the positions of the feature points of the three-dimensional model and the positions of the feature points of the target person is within an allowable range.

The above steps (step S1023 and step S1024) may be performed after the aforementioned step S1022, or may be performed independently of step S1021 and step S1022, that is, step S1023 and step S1024 are directly performed without performing step S1021 and step S1022.

In the case of execution after the aforementioned step S1022, since in the aforementioned step, the face contour (face shape) of the three-dimensional model has been preliminarily adjusted with reference to the target person. This step therefore adjusts the five sense organs of the three-dimensional model using the positions of the feature points on the face. For example, the shape of the five sense organs, the position of the five sense organs on the face, and the relative positional relationship between the five sense organs in the three-dimensional model are adjusted, so that the three-dimensional model as a whole can be closer to the target person.

As shown in fig. 6, in one embodiment, step S103 may include the following sub-steps:

s1031: performing two-dimensional processing on the adjusted three-dimensional model to obtain a two-dimensional image;

s1032: adjusting the positions of the feature points in the two-dimensional image by using the positions of the feature points of the target person;

s1033: and covering the target person in the image by using the adjusted two-dimensional image to generate a fused image.

After the adjusted three-dimensional model is obtained, in order to make the degree of matching between the three-dimensional model and the target person in the image higher, the adjusted three-dimensional model may be subjected to two-dimensional processing, so as to obtain a two-dimensional image corresponding to the three-dimensional model.

Further, after the two-dimensional image is obtained, the feature points of the original three-dimensional model are mapped to the two-dimensional image to obtain the feature points of the two-dimensional image.

And adjusting the positions of the characteristic points of the two-dimensional image according to the positions of the characteristic points of the target person in the image. The purpose of the adjustment is as follows: the two-dimensional image may be warped such that its face edge, lower nose edge, lower eyebrow edge are completely aligned with the face edge, lower nose edge, lower eyebrow edge of the target person in the image. After the edges are adjusted, the eyes of the two-dimensional image are aligned to the positions of the eyes of the target person in the original image in a linkage mode, and therefore the fused image is generated.

By the scheme, the two-dimensional image is obtained by performing two-dimensional processing on the adjusted three-dimensional model. And then the two-dimensional image is adjusted according to the target person, so that the image of the head area which is better matched with the target person in the image can be obtained.

As shown in fig. 7, in an embodiment, step S103 may further include the following sub-steps:

s1034: acquiring a mask of a head area of a target person in an image;

s1035: and combining the mask with the head area of the target person in the two-dimensional image to generate a fused image.

The mask of the head region of the target person in the image is acquired in order to distinguish the target person from the background. In addition, the mask can also realize the extraction of information such as the hair accessories and the glasses of the target person.

As shown in fig. 8, the left side of fig. 8 is the mask of the target person head region. As can be seen from the figure, the mask extracts the hairstyle and glasses of the target person and distinguishes the target person from the background in the image. The image at the middle position in fig. 8 is a two-dimensional image obtained by performing two-dimensional processing on the adjusted three-dimensional model in the above step. The right side in fig. 8 is an effect diagram in which the mask of the head region of the target person is combined with the head region of the target person in the two-dimensional image.

Through the scheme, the target person and the background in the image are distinguished by utilizing the mask, so that the last correction of the face in the two-dimensional image can be realized, and the reduction degree of the face in the two-dimensional image and the face of the target person is higher. Further, the hair style, glasses, and the like of the target person can be obtained by using the mask, and the fused image can be more approximate to the target person by combining the information with the two-dimensional image.

As shown in fig. 9, in an embodiment, step S103 may further include the following sub-steps:

s1036: determining a skin color characteristic of a head area of a target person in an image;

s1037: and adjusting the skin color of the head area in the two-dimensional image by using the skin color characteristic.

Due to the influence of factors such as illumination, the skin color of the head area of the target person in the image is uneven in brightness and darkness in most cases. Based on this, the skin color of the head region of the target person in the image may be analyzed to determine skin color characteristics. Furthermore, the determined skin color feature is acted on the head area in the two-dimensional image, namely, the skin color of the head area in the two-dimensional image is adjusted by using the skin color feature, so that the finally presented image is closer to the target person.

The determination of the skin color features in the step can be realized by adopting a pre-trained skin color feature neural network model. When the skin color characteristic neural network model is trained, image samples with different skin color characteristics can be selected in advance to be used as input of the skin color characteristic neural network model to be trained. And taking the skin color result as the output of the skin color characteristic neural network model to be trained, and training the skin color result.

And under the condition that the error between the output result of the skin color characteristic neural network on the test sample and the real result is within an allowable range, indicating that the training of the skin color characteristic neural network is finished.

As shown in fig. 10, in one embodiment, the present application further provides an image generation apparatus, which may include the following components:

a three-dimensional model determination module 1001 configured to determine a three-dimensional model according to a target person included in the received image;

a three-dimensional model adjusting module 1002, configured to adjust a three-dimensional model according to characteristics of a target person to obtain an adjusted three-dimensional model;

an image fusion module 1003, configured to fuse the adjusted three-dimensional model with a target person included in the image, and generate a fused image.

In one embodiment, the three-dimensional model adjustment module 1002 may further comprise:

the characteristic determination submodule is used for determining at least one characteristic of the facial feature and the expression feature of the target character;

and the three-dimensional model adjusting execution submodule is used for adjusting the three-dimensional model according to the facial feature and/or the expression feature.

In one embodiment, the three-dimensional model adjustment module 1002 may further include:

the characteristic point position determining submodule is used for respectively determining the positions of the characteristic points of the target person and the positions of the characteristic points of the three-dimensional model;

and the three-dimensional model adjustment execution sub-module is used for adjusting the positions of the characteristic points of the three-dimensional model according to the positions of the characteristic points of the target person, so that the error between the positions of the characteristic points of the adjusted three-dimensional model and the positions of the characteristic points of the target person is within an allowable range.

In one embodiment, the image fusion module 1003 may further include:

the two-dimensional processing submodule is used for carrying out two-dimensional processing on the adjusted three-dimensional model to obtain a two-dimensional image;

the two-dimensional image adjusting submodule is used for adjusting the positions of the characteristic points in the two-dimensional image by utilizing the positions of the characteristic points of the target person;

and the image fusion execution submodule is used for covering the target person in the image by using the adjusted two-dimensional image and generating a fused image.

In one embodiment, the image fusion performing sub-module may further include:

a mask acquisition unit configured to acquire a mask of a head region of a target person in an image;

and the image fusion execution unit is used for combining the mask and the head area of the target person in the two-dimensional image to generate a fused image.

In one embodiment, the image fusion execution sub-module may further include:

the skin color characteristic determining unit is used for determining the skin color characteristic of the head area of the target person in the image;

and the skin color adjusting unit is used for adjusting the skin color of the head area in the two-dimensional image by utilizing the skin color characteristics.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 11, is a block diagram of an electronic device according to a method of image generation of an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 11, the electronic apparatus includes: one or more processors 1110, a memory 1120, and interfaces for connecting the various components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 11 illustrates an example of a processor 1110.

The memory 1120 is a non-transitory computer readable storage medium provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the method of image generation provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the method of image generation provided herein.

The memory 1120, which is a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the method of image generation in the embodiments of the present application (for example, the three-dimensional model determination module 1001, the three-dimensional model adjustment module 1002, and the image fusion module 1003 shown in fig. 10). The processor 1110 executes various functional applications of the server and data processing, i.e., implements the method of image generation in the above-described method embodiments, by executing non-transitory software programs, instructions, and modules stored in the memory 1120.

The memory 1120 may include a program storage area and a data storage area, wherein the program storage area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device of the method of image generation, and the like. Further, the memory 1120 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 1120 optionally includes memory remotely located from the processor 1110, and such remote memory may be connected to the electronics of the method of image generation over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method of image generation may further include: an input device 1130 and an output device 1140. The processor 1110, the memory 1120, the input device 1130, and the output device 1140 may be connected by a bus or other means, and the bus connection is exemplified in fig. 11.

The input device 1130 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus of the method of image generation, such as an input device of a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, etc. The output devices 1140 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and Virtual Private Server (VPS) service. The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present application can be achieved, and the present invention is not limited herein.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method of image generation, comprising:

and fusing the adjusted three-dimensional model with the target person contained in the image to generate a fused image.

2. The method of claim 1, wherein said adapting the three-dimensional model according to the characteristics of the target person comprises:

determining at least one of facial and expressive features of the target character;

and adjusting the three-dimensional model according to the facial feature and/or the expression feature.

3. The method of claim 1 or 2, wherein said adapting the three-dimensional model according to the characteristics of the target person comprises:

respectively determining the positions of the characteristic points of the target person and the positions of the characteristic points of the three-dimensional model;

and adjusting the positions of the characteristic points of the three-dimensional model according to the positions of the characteristic points of the target person, so that the error between the positions of the characteristic points of the three-dimensional model after adjustment and the positions of the characteristic points of the target person is within an allowable range.

4. The method of claim 1, wherein fusing the adjusted three-dimensional model with the target person included in the image comprises:

performing two-dimensional processing on the adjusted three-dimensional model to obtain a two-dimensional image;

adjusting the positions of the feature points in the two-dimensional image by using the positions of the feature points of the target person;

and covering the target person in the image by using the adjusted two-dimensional image to generate a fused image.

5. The method of claim 4, wherein the overlaying the target person in the image with the adjusted two-dimensional image to generate a fused image comprises:

acquiring a mask of a head area of the target person in the image;

and combining the mask with the head area of the target person in the two-dimensional image to generate a fused image.

6. The method of claim 4 or 5, wherein fusing the adjusted three-dimensional model with the target person included in the image, further comprises:

determining a skin color feature of a head region of the target person in the image;

and adjusting the skin color of the head area in the two-dimensional image by using the skin color characteristic.

7. An apparatus of image generation, comprising:

and the image fusion module is used for fusing the adjusted three-dimensional model with the target person contained in the image to generate a fused image.

8. The apparatus of claim 7, wherein the three-dimensional model adjustment module comprises:

a feature determination submodule for determining at least one of a facial feature and an expressive feature of the target character;

9. The apparatus of claim 7 or 8, wherein the three-dimensional model adjustment module comprises:

10. The apparatus of claim 7, wherein the image fusion module comprises:

and the image fusion execution submodule is used for covering the target person in the image by using the adjusted two-dimensional image to generate a fused image.

11. The apparatus of claim 10, wherein the image fusion performing sub-module comprises:

a mask acquisition unit configured to acquire a mask of a head region of the target person in the image;

12. The apparatus of claim 10 or 11, wherein the image fusion performing sub-module further comprises:

a skin color feature determination unit for determining a skin color feature of a head region of the target person in the image;

and the skin color adjusting unit is used for adjusting the skin color of the head area in the two-dimensional image by using the skin color characteristics.

13. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 6.

14. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 6.