CN116824020A

CN116824020A - Image generation method and device, apparatus, medium, and program

Info

Publication number: CN116824020A
Application number: CN202311082786.1A
Authority: CN
Inventors: 请求不公布姓名
Original assignee: Beijing Shengshu Technology Co ltd
Current assignee: Beijing Shengshu Technology Co ltd
Priority date: 2023-08-25
Filing date: 2023-08-25
Publication date: 2023-09-29

Abstract

The embodiment of the disclosure discloses an image generation method, an image generation device, equipment, a medium and a program, which are used for acquiring an image generation task input by a user, outputting a demand prompt message, acquiring a demand reply message input by the user based on the demand prompt message, determining an image generation demand based on the demand reply message, further determining an image display scheme based on the image generation demand and outputting description information of the image display scheme, generating an image based on the image display scheme confirmed by the user in response to receiving a confirmation message of the user to the image display scheme, and outputting the image. Therefore, the image generation method and the device can generate images for the user without the need of the user to express clear ideas and design schemes on the wanted images and prompt submitted to the model, and improve image generation efficiency and user experience.

Description

Image generation method and device, apparatus, medium, and program

Technical Field

The present disclosure relates to artificial intelligence technology, natural language technology, and data processing technology, and more particularly, to an image generation method and apparatus, device, medium, and program.

Background

With the development of Artificial Intelligence (AI) technology, the generation model of artificial intelligence has great potential in content authoring. In the art and design industry, users wish to conduct related work in content authoring through artificial intelligence generation models, e.g., by submitting text prompts (templates) to the generation models, from which various types of images, such as product designs, illustrations, animations, etc., are generated.

Disclosure of Invention

The embodiment of the disclosure provides an image generation method and device, equipment, medium and program for generating an image.

In one aspect of the embodiments of the present disclosure, there is provided an image generating method including:

acquiring an image generation task input by a user;

outputting a demand prompt message, and acquiring a demand reply message input by a user based on the demand prompt message;

determining an image generation requirement based on the requirement reply message;

determining an image display scheme based on the image generation requirement, and outputting description information of the image display scheme;

and generating an image based on the image display scheme confirmed by the user and outputting the image in response to receiving the confirmation message of the user on the image display scheme.

Optionally, in any method embodiment of the present disclosure, after acquiring the image generating task input by the user, the method further includes:

determining a demand prompt message corresponding to the image generation task;

outputting a demand hint message comprising:

and outputting a demand prompt message corresponding to the image generation task.

Optionally, in any method embodiment of the disclosure, the requirement prompting message includes any one or more of the following information related to the image generating task: task type, target audience, expression topic, style, color, shooting angle, brightness effect, expressed visual sensation, expressed emotion, expressed information, and expressed environment.

Optionally, in any method embodiment of the disclosure, determining an image presentation scheme based on the image generation requirement includes:

determining at least one image display scheme based on the image generation requirement, and outputting description information of the at least one image display scheme, wherein each image display scheme in the at least one image display scheme comprises any one or more of the following scheme information: the method comprises the steps of integrally describing information of an image, a theme of the image, an image presentation environment, main visual elements and a presentation style;

Responsive to receiving a user confirmation message for the image presentation scheme, generating an image based on the user-confirmed image presentation scheme, comprising:

responsive to receiving a confirmation message sent by a user for a target image presentation scheme of the at least one image presentation scheme, an image is generated based on the target image presentation scheme.

Optionally, in any method embodiment of the disclosure, further comprising:

responding to a scheme modification message sent by a user aiming at a target image display scheme in the at least one image display scheme, and acquiring a scheme modification requirement in the scheme modification message;

determining a modified image display scheme based on the target image display scheme and the scheme modification requirement, and outputting the description information of the modified image display scheme;

responding to a received confirmation message sent by a user aiming at the modified image display scheme, taking the modified image display scheme as an image display scheme confirmed by the user, executing the image display scheme based on the user confirmation to generate an image, and outputting the image;

and in response to receiving a scheme modification message sent by a user for the modified image display scheme, replacing the target image display scheme with the modified image display scheme, and iteratively executing the operation of acquiring the scheme modification requirement in the scheme modification message.

Optionally, in any method embodiment of the disclosure, in response to receiving a confirmation message from the user for the image display scheme, generating an image based on the image display scheme confirmed by the user includes:

responding to the received confirmation message of the user on the image display scheme, determining scheme details aiming at the image display scheme confirmed by the user, and outputting prompt words and description information of the scheme details;

responsive to receiving a user confirmation message for the solution details, an image is generated based on the user-confirmed image presentation solution and the user-confirmed solution details.

Optionally, in any method embodiment of the disclosure, the prompting words of the scheme details include any one or more of the following details and at least one attribute value of each detail: scheme refinement points and implementation modes of the scheme refinement points, image shape, image size, image resolution and image direction;

responsive to receiving a user confirmation message for the solution details, generating an image based on the user-confirmed image presentation solution and the user-confirmed solution details, comprising:

in response to receiving a confirmation message sent by a user based on a target attribute value of each item of detail determined by at least one attribute value of each item of detail, taking the determined target attribute value of each item of detail as a scheme detail confirmed by the user, and generating an image based on an image display scheme confirmed by the user and the scheme detail confirmed by the user.

Optionally, in any method embodiment of the disclosure, when the solution refinement point includes a presentation style, the attribute value of the presentation style includes a text identifier and/or a style representative image of the presentation style.

Optionally, in any method embodiment of the disclosure, in response to receiving a confirmation message from the user for the image display scheme, generating an image based on the image display scheme confirmed by the user, further includes:

responding to receiving an adjustment message sent by a user to the scheme details, and acquiring adjustment requirements in the adjustment message;

determining adjusted scheme details based on the scheme details and the adjustment requirements, and outputting prompt words and description information of the adjusted scheme details;

responding to a confirmation message sent by a user to the adjusted scheme details, taking the adjusted scheme details as the scheme details confirmed by the user, executing the image display scheme based on the user confirmation and the scheme details confirmed by the user to generate an image, and outputting the image;

and in response to receiving an adjustment message sent by a user for the adjusted scheme details, iteratively executing the operation of acquiring the adjustment requirement in the adjustment message for the adjusted scheme details.

Optionally, in any method embodiment of the disclosure, generating the image based on the user-confirmed image presentation scheme and the user-confirmed scheme details includes:

inputting scheme information of a user-confirmed image display scheme and prompt words of user-confirmed scheme details into an image generation model so that the image generation model generates an image based on the scheme information of the user-confirmed image display scheme and the prompt words of the user-confirmed scheme details;

and receiving an image output by the image generation model.

Optionally, in any method embodiment of the present disclosure, the image generating model generates an image based on the plan information of the image display plan confirmed by the user and the prompt word of the plan detail confirmed by the user, including:

and the image generation model generates an initial image based on the image display scheme confirmed by the user, and adjusts the initial image based on the prompting words of the scheme details confirmed by the user to obtain an optimized image.

Optionally, in any method embodiment of the disclosure, after generating the image based on the image presentation scheme confirmed by the user, the method further includes:

displaying the image;

in response to receiving a first modification message of a user to the image, determining a first image modification scheme based on the first modification message, wherein the first image modification scheme comprises a modification scheme of an image display scheme and/or a modification scheme of scheme details, and outputting description information of the first image modification scheme;

In response to receiving a confirmation message sent by a user for the first image modification scheme, modifying the image display scheme confirmed by the user and/or scheme details confirmed by the user based on the first image modification scheme, and generating an image according to the modified image display scheme and the modified scheme details;

in response to receiving a second modification message sent by a user for the image modification scheme, iteratively executing the operation of determining a first image modification scheme based on the first modification message by taking the second modification message as a new first modification message;

outputting the image, comprising:

and outputting the image in response to receiving an output indication message of the user on the image.

Optionally, in any method embodiment of the disclosure, the image generating task includes task text that needs to generate an image.

Optionally, in any method embodiment of the disclosure, the image generating task includes: a base image and a task text for the base image;

after acquiring the image generation task input by the user, the method further comprises the following steps:

performing image recognition on the basic image to obtain an image recognition result;

determining an image presentation scheme based on the image generation requirements, comprising:

Determining an image presentation scheme based on the image generation requirements and the image recognition result;

generating an image based on the user-validated image presentation scheme, comprising:

an image is generated based on the base image and a user-validated image presentation scheme.

In another aspect of the embodiments of the present disclosure, there is provided an image generating apparatus including:

the acquisition module is used for acquiring an image generation task input by a user;

the output module is used for outputting a demand prompt message;

the acquisition module is further used for acquiring a demand reply message input by a user based on the demand prompt message;

a first determining module for determining an image generation requirement based on the requirement reply message;

a second determining module for determining an image display scheme based on the image generation requirement;

the output module is also used for outputting the description information of the image display scheme;

the acquisition module is also used for receiving a confirmation message of the user on the image display scheme;

the image generation module is used for responding to the acquisition module to receive the confirmation message of the user on the image display scheme and generating an image based on the image display scheme confirmed by the user;

The output module is also used for outputting the image.

In still another aspect of the embodiments of the present disclosure, there is provided another image generating apparatus including:

the interaction module is used for acquiring an image generation task input by a user, calling a language model to determine a demand prompt message corresponding to the image generation task, and outputting the demand prompt message; acquiring a demand reply message input by a user based on the demand prompt message, calling the language model to determine an image generation demand based on the demand reply message, determining an image display scheme and description information of the image display scheme based on the image generation demand, and outputting the description information of the image display scheme; in response to receiving a confirmation message of a user for the image display scheme, invoking an image generation model to generate an image based on the image display scheme confirmed by the user, and outputting the image;

the language model is used for determining a demand prompt message corresponding to the image generation task, determining an image generation demand based on the demand reply message, and determining an image display scheme and description information of the image display scheme based on the image generation demand;

The image generation model is used for generating an image based on an image display scheme confirmed by a user.

In yet another aspect of the disclosed embodiments, there is provided an electronic device including:

a memory for storing a computer program;

and a processor configured to execute the computer program stored in the memory, and when the computer program is executed, implement the method according to any one of the embodiments of the present disclosure.

In yet another aspect of the disclosed embodiments, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the method of any of the embodiments of the disclosure.

In yet another aspect of embodiments of the present disclosure, a computer program product is provided, comprising computer program instructions which, when executed by a processor, implement the method of any of the embodiments of the present disclosure.

According to the image generation method, the device, the medium and the program provided by the embodiment of the disclosure, under the condition that a user only inputs an image generation task, the user can be guided to input a requirement reply message through an output requirement prompt message, so that the image generation requirement of the image generation task is determined, then a corresponding image display scheme is determined based on the image generation requirement, the description information of the image display scheme is output, after the user confirms the image display scheme, an image is generated and output based on the image display scheme confirmed by the user, and in the whole image generation process, the user can generate an image for the user without expressing clear ideas, design schemes and wanted design elements of the image and converting the clear ideas and the wanted design elements into clear prompts which can be submitted to a model, so that the image generation efficiency and the user use experience are improved; determining the image generation requirement of the image generation task based on the requirement reply message input by the user, thereby determining the corresponding image display scheme generation image, generating the image according to the actual requirement of the user under the condition of understanding the requirement of the user, and generating the image not by simply relying on the prompt input by the user, so that the generated image is more in line with the expression effect wanted by the user, and the image generation quality and efficiency are improved; because the user expresses clear ideas, design schemes and wanted design elements on wanted images, and converts the clear ideas, design schemes and wanted design elements into clear prompts which can be submitted to a model, and needs to have deep knowledge on the expertise of art, design, photography and the like, the embodiment of the disclosure breaks through the use limit of users which are not familiar with the expertise of art, design, photography and the like, reduces the use threshold for image generation based on the AI technology, and expands the application range of image generation based on the AI technology.

The technical scheme of the present disclosure is described in further detail below through the accompanying drawings and examples.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.

The disclosure may be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings in which:

FIG. 1 is a flow chart of one embodiment of an image generation method of the present disclosure;

FIG. 2 is a flow chart of another embodiment of the image generation method of the present disclosure;

FIG. 3 is a flow chart of yet another embodiment of the image generation method of the present disclosure;

FIG. 4 is a flow chart of yet another embodiment of an image generation method of the present disclosure;

FIG. 5 is a flow chart of yet another embodiment of the image generation method of the present disclosure;

FIG. 6 is a flow chart of yet another embodiment of an image generation method of the present disclosure;

FIG. 7 is a schematic view of a structure of an embodiment of an image generating apparatus of the present disclosure;

FIG. 8 is a schematic view of another embodiment of an image generating apparatus of the present disclosure;

fig. 9 is a schematic structural view of a further embodiment of an image generating apparatus of the present disclosure;

FIG. 10 is a schematic view of a structure of still another embodiment of an image generating apparatus of the present disclosure;

fig. 11 is a schematic structural view of an application embodiment of the electronic device of the present disclosure.

Detailed Description

Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless it is specifically stated otherwise.

It will be appreciated by those of skill in the art that the terms "first," "second," etc. in embodiments of the present disclosure are used merely to distinguish between different steps, devices or modules, etc., and do not represent any particular technical meaning nor necessarily logical order between them.

It should also be understood that in embodiments of the present disclosure, "plurality" may refer to two or more, and "at least one" may refer to one, two or more.

It should also be appreciated that any component, data, or structure referred to in the presently disclosed embodiments may be generally understood as one or more without explicit limitation or the contrary in the context.

In addition, the term "and/or" in this disclosure is merely an association relationship describing an association object, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the front and rear association objects are an or relationship.

It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and that the same or similar features may be referred to each other, and for brevity, will not be described in detail.

Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.

Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.

Embodiments of the present disclosure may be applicable to electronic devices such as terminal devices, computer systems, servers, etc., which may operate with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with the terminal device, computer system, server, or other electronic device include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments that include any of the foregoing, and the like.

Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment in which tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computing system storage media including memory storage devices.

In the related art, when a user generates an image by generating a model, the user needs to have a clear idea or design scheme for the image to be generated, and clear express the wanted design element and convert the wanted design element into a clear text prompt which can be submitted to the generated model, so that the user needs to have a deep understanding of professional knowledge such as art, design and photography, and such a threshold limits the common application of the image generation technology. For non-professional average users, the design solution capability is generally lacking, and when there is a clear need, it is difficult to imagine what design elements are used to express the image, and it is difficult to obtain a good image generation effect using the generation model. For example, a user may want to create a Chinese character game, but may not want to use the text hint "a fighter wearing gorgeous ancient armor, holding a long, decorated sword in his hand, the background being a ambitious, complex carved and calm garden Chinese traditional building" to express the desired design elements.

FIG. 1 is a flow chart of one embodiment of an image generation method of the present disclosure. As shown in fig. 1, the image generation method of this embodiment includes:

102, acquiring an image generation task input by a user.

In the embodiment of the disclosure, the image generation task may also be referred to as an image generation purpose, and may be obtained from an image generation task message input by a user. For example, in one specific example, a user inputs an image generation task message "i want to do a library poster", and the image generation task message may be identified and a preset task keyword extracted to obtain an image generation task "library poster".

In addition, in a specific implementation, the user can directly input the text-form image generation task message, and at this time, the text-form image generation task message can be directly subjected to preset task keyword extraction to obtain the image generation task. Or, the voice-form image generation task message may be input, and at this time, the voice-form image generation task message may be first subjected to voice recognition to obtain a voice recognition result text, and then the voice recognition result text may be subjected to extraction of a preset task keyword to obtain an image generation task.

The foregoing is merely an example of possible implementations, and embodiments of the present disclosure are not limited to a specific implementation of the task of obtaining an image generated by a user.

104, outputting the demand prompt message and acquiring a demand reply message input by the user based on the demand prompt message.

The demand prompt message is used for prompting the user about the specific demand of the image to be generated, and the user can reply to the specific demand by inputting the demand reply message based on the demand prompt message.

And 106, determining the image generation requirement based on the requirement reply message.

108, determining an image display scheme based on the image generation requirement, and outputting the description information of the image display scheme.

The image display scheme is the whole information or main design element of the image expression and is used for determining the main visual content of the image.

110, in response to receiving a user confirmation message for the image presentation scheme, generating an image based on the image presentation scheme confirmed by the user, and outputting the image.

Optionally, in some of these implementations, operations 102-108 may be performed by a pre-trained language model, such as a large language model (Large Language Model, LLM), by which images are generated based on user-validated image presentation schemes; alternatively, operations 102-110 (i.e., implementing the entire image generation method flow) may also be performed by one AI model, which is not limited by embodiments of the disclosure. In the following embodiments, if no special description is given, the operations 102-108 are executed by using a language model, and the image generation is performed by using an image generation model as an example, which is applicable to a specific implementation of implementing the whole image generation method flow by using one AI model (which may be understood to include the language model and the image generation model), and will not be described again.

In the embodiment of the disclosure, the language model can accurately understand the image generation task input by the user, accurately understand the image generation requirement based on the requirement reply message input by the user, accurately understand the image display scheme by the user and generate the feedback of the image aiming at each message input by the user through a large amount of learning and training by combining with the deep understanding of the interactive context; the language model can learn a great deal of expertise such as art, design and photography in a training stage, can determine a corresponding image display scheme according to various image generation requirements, has natural language expression and organization capability through training of a great deal of corpus, and can express the description information of the image display scheme according to the image display scheme in natural language.

In the embodiment of the disclosure, a user can describe an image to be generated through natural language, an AI model or a language model can understand task requirements input by the user, a set of preliminary image display schemes are provided, the image is generated based on the image display schemes confirmed by the user after the user confirms the image, the image is generated according to actual requirements of the user under the condition that the user requirements are understood, and the image is generated without simply relying on prompts input by the user, so that the generated image more accords with the actual requirements of the user.

According to the embodiment, under the condition that a user only inputs an image generating task, the user is guided to input a requirement reply message through an output requirement prompt message, so that the image generating requirement of the image generating task is determined, then a corresponding image display scheme is determined based on the image generating requirement, and the description information of the image display scheme is output; determining the image generation requirement of the image generation task based on the requirement reply message input by the user, thereby determining the corresponding image display scheme generation image, generating the image according to the actual requirement of the user under the condition of understanding the requirement of the user, and generating the image not by simply relying on the prompt input by the user, so that the generated image is more in line with the expression effect wanted by the user, and the image generation quality and efficiency are improved; because the user expresses clear ideas, design schemes and wanted design elements on wanted images, and converts the clear ideas, design schemes and wanted design elements into clear prompts which can be submitted to a model, and needs to have deep knowledge on the expertise of art, design, photography and the like, the embodiment of the disclosure breaks through the use limit of users which are not familiar with the expertise of art, design, photography and the like, reduces the use threshold for image generation based on the AI technology, and expands the application range of image generation based on the AI technology.

In practical implementations, different image generation tasks may have different image generation requirements, and thus, a corresponding output requirement prompting message is required. For example, when the image generation task is a poster, the image generation requirement may require content that specifies the item type (i.e., brand, activity, website, social media, etc.), target audience (design goals, age, gender, interests, etc.), brand style (drawing, illustration, etc.), information content (communicated information, feel, etc.), and so forth; while when the image generation task is an avatar, the image generation requirement may require content specifying age, sex, face shape, hairstyle, decoration, color, and the like.

Optionally, in some implementations, after the image generating task input by the user is acquired through operation 102, a requirement prompting message corresponding to the image generating task or a requirement prompting message for the image generating task may be determined. Accordingly, in operation 104, a demand hint message corresponding to the image generation task is output.

Optionally, in some implementations, the requirement prompting message may include, for example, but not limited to, any one or more of the following information related to the image generation task: task type, target audience, presentation theme, style (i.e., presentation style), color, angle of capture, brightness effect, visual sensation of presentation, emotion of presentation, information of presentation, environment of presentation, etc. Specific information of the demand-prompt message needs to be determined according to the demand of the image generation task, and the embodiment of the disclosure does not limit the specific information included in the demand-prompt message.

In the embodiment of the disclosure, the language model learns a great deal of expertise such as art, design and photography in a training stage, can determine that clear information is required for various image generation tasks, and generates corresponding demand prompt messages so that a user provides demand reply messages for the specific information, and determines image generation demands through the demand reply messages.

In the embodiment of the disclosure, the language model may perform conversational interaction with the user, after acquiring the image generating task of the user, the language model may output a demand prompt message, for example, to whom the image is to be displayed (i.e., the target audience), what the theme and information the image is to express (i.e., the expressed theme and the expressed information), what visual sensation is desired to be expressed (i.e., the expressed visual sensation), what style preference (i.e., style), and so on, through conversations with the user, what the user's demand is based on the conversational content, i.e., the image generating demand is determined based on the demand reply message of the user.

Optionally, in some implementations, at least one image presentation scheme may be determined based on the image generation requirements and the description information of the at least one image presentation scheme may be output in operation 108. Wherein each image presentation scheme of the at least one image presentation scheme may include, for example, but not limited to, any one or more of the following scheme information: image overall description information, image theme, image presentation environment, main visual elements, presentation style, etc.

After determining the image generation requirement, the language model can determine an image display scheme according to the image generation task and the image generation requirement by utilizing professional knowledge such as pre-learned art, design, photography and the like, for example, the description of the whole image, the theme, the environment, the mainly used visual elements, the presented style characteristics and the like of the image, and generate and output the description information of the image display scheme by utilizing the pre-learned natural language expression and organization capability. If the user is satisfied with the image presentation scheme representation, an image may be generated using the image presentation scheme; if the user believes that the image presentation needs further improvement or optimization, the language model may continue to be directed through the dialog to modify the image presentation until the user is satisfied, then the image is generated using the user-validated image presentation.

Accordingly, in operation 110, in response to receiving a confirmation message sent by the user for a target image presentation scheme of the at least one image presentation scheme, an image is generated based on the target image presentation scheme. The target image display scheme is an image display scheme confirmed by a user in the at least one image display scheme, for example, the user can select one image display scheme from the at least one image display scheme as the image display scheme adopted by the user for confirmation, or the user can input a scheme Identification (ID) of the one image display scheme adopted by the user for confirmation to confirm the image display scheme adopted by the user, wherein each scheme ID is used for uniquely identifying one image display scheme in the at least one image display scheme. In a specific implementation, the scheme ID may be information of a scheme number, a scheme sequence, a scheme title, etc. of one image display scheme in the at least one image display scheme, which is not limited by the embodiment of the present disclosure.

Fig. 2 is a flowchart of another embodiment of the image generation method of the present disclosure. As shown in fig. 2, the image generation method of this embodiment includes:

202, an image generation task input by a user is acquired.

204, outputting the demand prompt message, and acquiring a demand reply message input by the user based on the demand prompt message.

206, determining the image generation requirement based on the requirement reply message.

208, determining at least one image display scheme based on the image generation requirement, and outputting the description information of the at least one image display scheme.

In response to receiving the confirmation message sent by the user for the target image presentation scheme of the at least one image presentation scheme, performing operation 210 with the target image presentation scheme as the image presentation scheme confirmed by the user. Otherwise, operation 212 is performed in response to receiving a scheme modification message sent by the user for a target image presentation scheme of the at least one image presentation scheme.

210, generating an image based on the image presentation scheme confirmed by the user, and outputting the image.

212, obtain the schema modification requirements in the schema modification message.

214, determining a modified image display scheme based on the target image display scheme and the scheme modification requirement, and outputting description information of the modified image display scheme.

In response to receiving the confirmation message sent by the user for the modified image presentation scheme, operation 210 is performed with the modified image presentation scheme as the user-confirmed image presentation scheme. Otherwise, in response to receiving a scheme modification message sent by the user for the modified image presentation scheme, operation 212 is iteratively performed.

In the embodiment of the disclosure, the language model can accurately understand the scheme modification requirement in the scheme modification message, and can redetermine the modified image display scheme based on the target image display scheme and the scheme modification requirement by utilizing professional knowledge of art, design, photography and the like learned in the training stage.

In the embodiment of the disclosure, one or more image display schemes can be determined based on the image generation requirement, a user can directly select one image display scheme as a confirmation image display scheme from the at least one image display scheme to generate an image, an image display scheme modification message can be selected from the at least one image display scheme to express the modification opinion, the language model is used for modifying the image display scheme selected by the user until the user is satisfied, and then the image is generated by using the image display scheme confirmed by the user.

Fig. 3 is a flowchart of yet another embodiment of the image generation method of the present disclosure. As shown in fig. 3, on the basis of any one of the above embodiments of the image generating method, generating an image based on the image presentation scheme confirmed by the user may include:

302, a scheme detail is determined for the image display scheme confirmed by the user, and prompting words and description information of the scheme detail are output.

In response to receiving the user's confirmation message of the details of the scheme, operation 304 is performed.

Alternatively, in some of these implementations, this operation 302 may be performed by a language model.

304, generating an image based on the user-validated image presentation plan and the user-validated plan details.

Alternatively, in some of these implementations, this operation 304 may be performed by an image generation model.

When an image is generated by an image generation model, professional artistic, design and photographic knowledge are required for the text prompt submitted to the generation model in order to improve the image generation quality, and a general user lacks related professional knowledge and may not be able to give the professional text prompt, so that even if the user thinks of the design scheme, a more ideal image generation effect is difficult to obtain.

In this embodiment, the language model may learn a great deal of expertise such as art, design, photography, etc. in the training stage, after the user confirms the image display scheme, further invoke the language model to refine the image display scheme confirmed by the user, determine the scheme details based on the image display scheme confirmed by the user, and output the prompt word and description information of the scheme details, where the prompt word and description information includes expressions of terms, for example, description of "digital illustration" about picture style in description information of one scheme detail, emphasize armor and exquisite design of building with high contrast and vivid colors, and after the user confirms the scheme details according to the prompt word and description information, the user can generate an image through the image generation model, thereby effectively improving the image generation quality, making the generated image more in line with the expression effect intended by the user, and improving the image generation efficiency and user experience.

Optionally, in some implementations, the hint terms of the solution details may include, for example, but are not limited to, any one or more of the following and at least one attribute value of each of the details: at least one scheme refinement and implementations of scheme refinement, image shape, image size, image resolution, image orientation, and so forth. At least one of these solution refinement points may include, for example, but not limited to, any one or more of the following: expressing a theme, presenting a style, a color, a photographing angle, a brightness effect, a visual sense of expression, information of expression, an environment of expression, and the like. The embodiments of the present disclosure are not limited to specific details, scheme refinement points, and the like.

Accordingly, in operation 304, in response to receiving a confirmation message sent by the user based on the target attribute value of each item of detail determined by the at least one attribute value of each item of detail, an image is generated based on the user-confirmed image presentation scheme and the user-confirmed scheme details with the target attribute value of each item of detail determined by the user as the user-confirmed scheme detail.

Taking the scenario refinement point as an example of a presentation style, the presentation style may include, for example, but is not limited to: photography, painting, illustration, sculpture, artwork, paper, 3D, and the like. Accordingly, the implementation of the scheme refinement point is a detailed description of how the style is implemented, for example, a shooting mode (such as macro, fish-eye style, portrait, etc.) adopted when the presentation style is shooting, a camera model used, a camera setting, etc.; when the presentation style is painting, describing the materials and the working materials used in detail; when the rendering style is rendering, setting a used engine; when the presentation style is the illustration, the type of the illustration (e.g., digital illustration, etc.); when the presentation style is artwork, the material of the artwork (such as wood artwork and the like); and other information that may be defined as an output type. Accordingly, the attribute value of the detail, that is, the specific parameter value of the detail, for example, the specific parameter value of the shooting mode (such as macro, fish-eye style, portrait, etc.) adopted when the presentation style is shooting, what the camera model used is, what the camera setting parameter value is, etc.; the rendering style is the specific parameter value set by the engine used when rendering. The embodiments of the present disclosure are not limited in this regard.

Wherein the image shape, i.e. the shape that the image appears to take on, may include, for example, but is not limited to: horizontal, vertical, square, circular, horizontal oval, vertical oval, etc. The image size is the image size that corresponds to the image shape. Thus, the appropriate image shape and image size can be recommended to the user.

In a specific implementation, when the scheme refinement point includes a presentation style, the attribute value of the presentation style includes any one or more of the following: a text Identification (ID) of a style is presented, the style representing the image. The presentation style character ID is a presentation style character name, such as photography, drawing, illustration, and the like. The style represents an image, i.e., an example of an image that presents the style, such as a specific camera example, a drawing example, a pictorial example, and so forth.

In the embodiment of the disclosure, after the scheme details are determined, multiple details and at least one attribute value thereof can be fed back to the user aiming at the scheme details, the user can select the wanted attribute value from the at least one attribute value of each detail as the target attribute value of each detail, and the implementation mode of each wanted scheme refinement point and the attribute value thereof can be specifically selected for each scheme refinement point as the target attribute value of each detail determined by the user for generating the image. If only one attribute value exists for each solution refinement point and only one implementation mode exists for each solution refinement point, the user does not need to select the attribute value and the implementation mode of the solution refinement point any more, and the confirmation message can be directly fed back.

Because the user selects the presentation style, a program depends on the image observation of the program style rather than just according to dialog imagination, the embodiment of the disclosure can provide style representative images for various presentation styles through a human-computer interaction interface, and the user can visually and quickly select the desired image style by looking at the style representative images and selecting the desired presentation style as the target presentation style, thereby being beneficial to improving the user friendliness and the image generation efficiency.

Fig. 4 is a flow chart of yet another embodiment of the image generation method of the present disclosure. As shown in fig. 4, on the basis of any one of the above embodiments of the image generating method, generating an image based on the image presentation scheme confirmed by the user may include:

and 402, determining scheme details aiming at the image display scheme confirmed by the user, and outputting prompt words and description information of the scheme details.

In response to receiving the user's confirmation message of the details of the scheme, operation 404 is performed. Otherwise, operation 406 is performed in response to receiving the adjustment message sent by the user for the scheme details.

404, generating an image based on the user-validated image presentation scheme and the user-validated scheme details.

406, obtaining the adjustment requirement in the adjustment message.

408, determining the adjusted scheme details based on the scheme details and the adjustment requirements, and outputting prompt words and description information of the adjusted scheme details.

In response to receiving the confirmation message sent by the user to the adjusted solution details, operation 404 is performed with the adjusted solution details as the solution details confirmed by the user. Otherwise, in response to receiving the adjustment message sent by the user for the adjusted solution details, operation 406 is iteratively performed for the adjusted solution details.

In the embodiment of the disclosure, after the scheme details are determined for the image display scheme confirmed by the user, the prompting words and the description information of the scheme details can be output, and after the user confirms the scheme details according to the prompting words and the description information, the image can be generated through the image generation model. If the user is not satisfied with the solution details, the solution details can be further optimized (for example, a more desirable presentation style is selected) by performing dialogue interaction with the language model, so that the solution details are more suitable for the user until the solution details satisfied by the user are obtained, and then an image is generated based on the image display solution confirmed by the user and the solution details confirmed by the user through the image generation model. Conversational interactions allow a user to adjust the details of the pattern in real time, optimizing the prompt to guide the image generation model to generate images in a more accurate, specialized language. Therefore, the image generation quality can be effectively improved, the generated image is more in line with the expression effect wanted by the user, and the image generation efficiency and the user experience are improved.

Alternatively, in some of these implementations, in operation 304 or 404, the plan information of the user-confirmed image presentation plan and the prompt word of the user-confirmed plan detail may be input into the image generation model, so that the image generation model generates an image based on the plan information of the user-confirmed image presentation plan and the prompt word of the user-confirmed plan detail, and receives an image output from the image generation model.

In this embodiment, after the user confirms the image display scheme and the scheme details, the image generation model obtained by training in advance is invoked, the deep learning capability of the image generation model is fully utilized, and the corresponding image is generated based on the scheme information of the image display scheme confirmed by the user and the prompt word of the scheme details confirmed by the user, so that the image generation quality and efficiency can be improved.

In the embodiment of the disclosure, the image generation model may be trained by using a plurality of image generation samples in advance, wherein each image generation sample includes scheme information and a prompt word of scheme details of an image display scheme.

In one implementation, an image generation model may be trained using a Generation Antagonism Network (GAN) that includes a generator and a arbiter. The scheme information and the prompt words in each image generation sample are respectively encoded into vectors to be input into a countermeasure network, the generators and the discriminators compete with each other, the generators generate images from random noise based on the input vectors, the discriminators discriminate whether the images generated by the generators are true or false, the generators adapt to feedback of the discriminators to learn, the generated images are continuously improved, the discriminators become more accurate along with lifting of the generators, and the images generated by the generators are closer to reality through countermeasure learning of the generators and the discriminators. In the countermeasure learning of the generator and the discriminator, the generator can gradually add more details into the preliminarily generated low-resolution sketch by using a diffusion technology, so that the effect of the low-resolution sketch is close to that of a real image, and the details of the image are enriched and more vivid through the gradual diffusion process. Through the generation of competition and countermeasure learning of the countermeasure network, the refined image can be ensured to be closely matched with scheme information and prompt words in each image generation sample, and more details are gradually added by utilizing a diffusion technology, so that the low-resolution sketch is close to the effect of a real image.

After the generation of the reactive network training is completed, a generator can be used as an image generation model, so that an image which is closely matched with the scheme information and the prompting words, has rich details, is more vivid and is close to the effect of a real image can be generated by utilizing the scheme information and the prompting words of the scheme of the image display scheme.

Alternatively, in some implementations, when the image generation model generates an image based on the scheme information of the user-confirmed image display scheme and the prompt word of the user-confirmed scheme detail, the image may be generated directly based on the scheme information of the user-confirmed image display scheme and the prompt word of the user-confirmed scheme detail; or generating an initial image based on the image display scheme confirmed by the user, and then adjusting the initial image based on the prompting words of the scheme details confirmed by the user to obtain an optimized image. The embodiments of the present disclosure are not limited in this regard.

Optionally, in a specific implementation, when the image generating model generates an image based on the scheme information of the image display scheme confirmed by the user and the prompt word of the scheme detail confirmed by the user, the image generating model may initially generate at least one preview image and display the preview image to the user, and after the user selects one preview image from the preview image as the confirmed preview image, the image enhancing is performed on the preview image confirmed by the user, for example, the image resolution is improved, the detail of the image is optimized, so as to improve the quality of the finally output image. If the image generation model can initially generate a preview image, the user can directly determine whether to use the preview image. If the user is not satisfied with the preliminarily generated preview image, the image display scheme and scheme details can be adjusted through the following embodiments, the preview image can be generated again for the user, and the computing resources and the video memory resources consumed by directly generating the high-quality image with long time consumption for directly generating the high-quality image once, poor physical examination of the guide user and unsatisfactory user can be avoided.

Fig. 5 is a flowchart of yet another embodiment of the image generation method of the present disclosure. As shown in fig. 5, after generating an image, on the basis of any one of the above embodiments of the image generating method, the method may further include:

502, an image is displayed.

The image may be one image or a plurality of images, and if the image is a plurality of images, the user may select one of the plurality of images to send the output indication message or modify the message.

And in response to receiving the output instruction message of the user for the image, performing an operation of outputting the image. Otherwise, operation 504 is performed in response to receiving the first modification message of the image by the user.

504, a first image modification scheme is determined based on the first modification message, and description information of the first image modification scheme is output.

The first image modification scheme includes a modification scheme of the image display scheme and/or a modification scheme of scheme details, that is, the first image modification scheme may be a modification scheme of the image display scheme, may also be a modification scheme of the scheme details, or may also include a modification scheme of the image display scheme and/or a modification scheme of the scheme details at the same time, where a specific modification scheme is determined according to content (i.e., modification requirement) of a modification message sent by a user.

Operation 506 is performed in response to receiving the confirmation message sent by the user for the first image modification scheme. Otherwise, operation 508 is performed in response to receiving the second modification message sent by the user for the image modification scheme. The first modification message and the second modification message are used herein only to refer to different modification messages.

506, modifying the image display scheme confirmed by the user and/or the scheme details confirmed by the user based on the first image modification scheme, and generating an image by using the modified image display scheme and the scheme details, namely, using the modified image display scheme and the scheme details as the image display scheme confirmed by the user and the scheme details confirmed by the user to generate the image.

The specific implementation of generating the image according to the modified image display scheme and the scheme details may refer to the specific implementation of generating the image according to the user-confirmed image display scheme and the user-confirmed scheme details in the above embodiment of the disclosure, which is not described herein.

508, iteratively performing operation 504 with the second modification message as a new first modification message.

In this embodiment, after the image is generated, the image (may be a high-quality image generated directly or the preview image) may be displayed, if the user is satisfied with the displayed image, an output instruction message for the image may be sent to output the image, and if the image is a high-quality image generated directly by the image generation model, the image may be directly output, and other applications such as downloading, holding or forwarding may be performed by the user; if the image is a preview image which is preliminarily generated by the image generation model, the preview image can be output after picture enhancement, and other applications such as downloading, holding or forwarding can be performed by a user. If the user is not satisfied with the displayed image, the image can be modified in a natural language dialogue mode, so that the use experience of the user is greatly improved.

Alternatively, in the image generating method of any of the embodiments of the present disclosure, the image generating task may be a task text that needs to generate an image. Alternatively, the image generation task may also include both a base image and task text for the base image, where the base image is an image provided by the user that requires reference or processing.

Fig. 6 is a flow chart of yet another embodiment of the image generation method of the present disclosure. As shown in fig. 6, when the image generation task includes a base image and task text for the base image, the image generation method of this embodiment includes:

602, acquiring an image generation task input by a user.

The image generation task includes a base image and task text for the base image. For example, the base image is a photograph of the user, and the task text of the base image "generates a cartoon head".

Thereafter, operations 604-606 and operation 608 may be performed, respectively.

604, outputting a demand prompt message, and acquiring a demand reply message input by a user based on the demand prompt message.

606, determining an image generation requirement based on the requirement reply message.

And 608, performing image recognition on the basic image to obtain an image recognition result.

The image recognition result may be an image recognition result associated with the task text, for example, if the task text of the base image is "generating a cartoon head image", the image recognition result may be a user head image in the base image and related information (such as a position, a pixel area, etc.); if the task text of the base image is "add a bundle of flowers on a table in the image", then the image recognition result may be the table in the base image and its related information (e.g., color, location, pixel area, etc.); etc., to which embodiments of the present disclosure are not limited.

The operations 604-606 and the operation 608 are not limited in execution sequence, and may be executed simultaneously, or may be executed in any sequence or any time difference, which is not limited by the embodiments of the present disclosure.

610, determining an image display scheme based on the image generation requirement and the image recognition result, and outputting description information of the image display scheme.

612, in response to receiving a user confirmation message for the image presentation scheme, generating an image based on the base image and the user-confirmed image presentation scheme, and outputting the image.

Based on the present embodiment, multimodal data including images and text may be processed, an image desired by a user may be generated based on a base image provided by the user, for example, a cartoon head image may be generated based on a real head image provided by the user (as a base image), some elements may be added to the base image provided by the user, or some elements may be replaced, removed, and so on.

The image generating method according to the embodiment of the present disclosure is further described below by taking a specific application example as an example:

s11, a user inputs a task message through a human-computer interaction interface, for example: i want to do a library poster.

S12, the language model acquires an image generation task 'library poster' input by a user based on a task message sent by the user, determines that a requirement prompt message corresponding to the image generation task needs to comprise a project type, a target audience, a brand style and information content, and outputs the requirement prompt message to the user, for example: your will be very happy to help you. First, i need more specific information about your project, whether the following details can be provided:

item type: is for a brand, an activity, a website, or social media? Or of other types?

Target audience: who is your design goal? Are their age, sex, interests, etc?

Brand style: can you describe a brand style if you already have it? Or, do you have a favorite color or pattern?

Information content: what information or feel you want to convey in the design?

S13, the user inputs a demand reply message based on the demand prompt message: i want to do a library poster, the audience is a student, and the content is the last time of the card punching activity of graduation.

S14, the language model determines an image generation requirement based on the requirement reply message, determines an image display scheme based on the image generation requirement, and outputs the description information of the image display scheme, for example:

very well, the following is an image presentation scheme of a basic library poster i created for you:

graphic element: an abstract library illustration, which can be designed to be a book-piled shape, creates an interesting and academic breath-filled visual focus. Under the illustration, a capitalized title, such as "last punch, leaving knowledge as a trace", may be placed, highlighting the importance of the activity.

Text element: in the middle of the poster you can write specific information of the activity including date, time and place. In order to make it easy to read, a clear, legible font, such as a serif-free font, may be selected. The font color may be chosen to be dark (e.g., dark blue or black) so as to be clearly visible on a bright background.

S15, the user inputs a confirmation message through a man-machine interaction interface, for example: with this solution bar, please add the style.

S16, the language model determines scheme details aiming at the image display scheme confirmed by the user, and outputs prompt words and description information of the scheme details.

Wherein, the prompt word, for example: presenting styles, art illustrations; image shape, vertical; image size, A2.

Description information such as:

very well let us set the visual style of this design as a unique artistic artwork style that will help your poster stand out from many others. This style can bring a visual effect that is full of creative and artistic feelings.

Design style: artwork art. We can use the art artwork style to design the library image. The books are superimposed to form a creative and attractive shape. This shape may be a huge graduation cap or a graduation certificate, which would connote the graduation and remind students that this was their last library punch. Furthermore, the artwork art may allow you more freedom to use colors, making the poster more attractive.

The poster is typically vertically oriented, so we will design a vertically oriented poster. One common poster size is the A2 size (420 mm x 594 mm), which should be large enough to make the poster appear clearly at a distance. At the same time, the size can be easily reduced to the A3 or A4 size so as to adapt to different printing requirements.

S17, the user inputs a confirmation message through a man-machine interaction interface, for example: and (3) the method is very good, and helps me generate the bar.

S18, the language model inputs the scheme information of the image display scheme confirmed by the user and the prompt words of the scheme details confirmed by the user into the image generation model.

S19, the image generation model generates an image based on the scheme information of the image display scheme confirmed by the user and the prompt words of the scheme details confirmed by the user and displays the generated image through a human-computer interaction interface.

S20, a user inputs a first modification message to the image through a human-computer interaction interface, for example: i want to be able to get a little more sunlight.

In the embodiment of the disclosure, the language model can be used for learning and training in a large amount, so that the requirements of each message input by a user can be accurately understood by combining deep understanding of the context of interaction, and the corresponding scheme meeting the requirements can be determined by utilizing professional knowledge such as art, design, photography and the like learned in the training stage.

S21, the language model determines a first image modification scheme based on the first modification message, wherein the first image modification scheme comprises a modification scheme of an image display scheme and a modification scheme of scheme details, and outputs description information of the first image modification scheme, for example:

Let us adjust the design to a brighter and sunny style, of course. This will help create a more lively and enthusiastic atmosphere, inducing enthusiasm for graduation card punching activities by students. The following are some modification suggestions:

1. color selection: we can use brighter hues to highlight the design. For example, the background color is adjusted from soft yellow and orange to brighter, more reactive yellow and orange. Meanwhile, the colors of books in the illustrations can also be more vivid colors, such as bright blue, bright green and the like, so as to increase vitality and visual impact.

2. Graphic element: our library illustration is adjusted to a lighter and lively style. The books can be designed to fly or gush out of the graduation cap to form a celebration visual effect.

3. Picture element: if available, some sunny library photos are used. For example, some students reading books in sunlight, or the look of libraries in bright sunlight. These pictures can give a pleasant, positive feel to the person, improving the brightness of the overall design.

S22, the user inputs a confirmation message of the first image modification scheme through the man-machine interaction interface, for example: and (3) the method is very good, and helps me generate the bar.

S23, the language model modifies the image display scheme confirmed by the user and/or the scheme details confirmed by the user based on the first image modification scheme to obtain a modified image display scheme and scheme details, and the scheme information of the modified image display scheme and the prompting words of the scheme details are input into the image generation model.

S24, generating an image by the scheme information of the modified image display scheme and the prompting words of the scheme details, and displaying the generated image through a human-computer interaction interface.

S25, the user inputs a confirmation message (for example, I are satisfied) for the image through a human-computer interaction interface; alternatively, the image is saved; or share the image to friends within the application or to third party applications, etc.

Any of the image generation methods provided by the embodiments of the present disclosure may be performed by any suitable device having data processing capabilities, including, but not limited to: terminal equipment, servers, etc. Alternatively, any of the image generation methods provided by the embodiments of the present disclosure may be executed by a processor, such as the processor executing any of the image generation methods mentioned by the embodiments of the present disclosure by invoking corresponding instructions stored in a memory. And will not be described in detail below.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.

Fig. 7 is a schematic structural view of an embodiment of the image generating apparatus of the present disclosure. The image generating apparatus of this embodiment can be used to implement the above-described image generating method embodiments of the present disclosure. As shown in fig. 7, the image generating apparatus of this embodiment includes: an acquisition module 702, an output module 704, a first determination module 706, a second determination module 708, and an image generation module 710. Wherein:

the acquiring module 702 is configured to acquire an image generating task input by a user.

And the output module 704 is configured to output the demand prompt message.

The obtaining module 702 is further configured to obtain a demand reply message input by the user based on the demand prompt message.

A first determining module 706 is configured to determine an image generation requirement based on the requirement reply message.

A second determination module 708 is configured to determine an image presentation scheme based on the image generation requirements.

The output module 704 is further configured to output description information of the image display scheme.

The obtaining module 702 is further configured to receive a confirmation message of the user to the image display scheme.

The image generation module 710 is configured to generate an image based on the image presentation scheme confirmed by the user in response to the acquisition module 702 receiving the confirmation message of the image presentation scheme from the user.

The output module 704 is further configured to output an image.

Fig. 8 is a schematic structural view of another embodiment of the image generating apparatus of the present disclosure. As shown in fig. 8, on the basis of the embodiment shown in fig. 7, the image generating apparatus of this embodiment further includes: and a third determining module 712, configured to determine a requirement prompting message corresponding to the image generating task. Correspondingly, the output module 704 is specifically configured to output a demand prompt message corresponding to the image generating task.

Optionally, in some implementations, the second determining module 708 is specifically configured to determine at least one image display scheme based on the image generation requirement, where each image display scheme in the at least one image display scheme may include, for example, but not limited to, any one or more of the following scheme information: image overall description information, image theme, image presentation environment, main visual elements, presentation style, etc. Accordingly, the output module 704 is specifically configured to output the description information of the at least one image display scheme. The image generating module 710 is specifically configured to generate an image based on the target image presentation scheme in response to the acquiring module 702 receiving a confirmation message sent by the user for the target image presentation scheme in the at least one image presentation scheme.

Optionally, in still another embodiment of the image generating apparatus of the present disclosure, the obtaining module 702 is further configured to obtain, in response to receiving a solution modification message sent by a user for a target image presentation solution in the at least one image presentation solution, a solution modification requirement in the solution modification message; determining a modified image display scheme based on the target image display scheme and the scheme modification requirement, and outputting description information of the modified image display scheme through the output module 704; in response to receiving a confirmation message sent by the user for the modified image presentation scheme, instructing the image generation module 710 to generate an image based on the user-confirmed image presentation scheme with the modified image presentation scheme as the user-confirmed image presentation scheme, and outputting the image through the output module 704; in response to receiving a scheme modification message sent by a user for a modified image presentation scheme, iteratively executing an operation of obtaining a scheme modification requirement in the scheme modification message by replacing the target image presentation scheme with the modified image presentation scheme.

Optionally, in some implementations, the acquiring module 702, in response to receiving a confirmation message from the user for the image presentation scheme, instructs the image generating module 710 to generate the image based on the image presentation scheme confirmed by the user, and is specifically configured to: responding to the received confirmation message of the user on the image display scheme, determining scheme details aiming at the image display scheme confirmed by the user, and outputting prompt words and description information of the scheme details; in response to receiving the user confirmation message for the solution details, the image generation module 710 is instructed to generate an image based on the user-confirmed image presentation solution and the user-confirmed solution details.

Optionally, in some implementations, the foregoing hint terms of the solution details may include, for example, but not limited to, any one or more of the following details and at least one attribute value of each of the details: scheme refinement and implementation of scheme refinement, image shape, image size, image resolution, image orientation, etc. The obtaining module 702, in response to receiving a user confirmation message for the scheme details, instructs the image generating module 710 to generate an image based on the user-confirmed image presentation scheme and the user-confirmed scheme details, and is specifically configured to: in response to receiving a confirmation message sent by the user based on the target attribute value of each item of detail determined by the at least one attribute value of each item of detail, the image generation module 710 is instructed to generate an image based on the user-confirmed image presentation scheme and the user-confirmed scheme details with the determined target attribute value of each item of detail as the user-confirmed scheme details.

Optionally, in some implementations, when the solution refinement point includes a presentation style, the attribute value of the presentation style may include a textual identification of the presentation style and/or a style representation image.

Optionally, in other implementations, the obtaining module 702, in response to receiving a confirmation message from the user for the image presentation scheme, is further specifically configured to, when instructing the image generating module 710 to generate an image based on the image presentation scheme confirmed by the user: responding to receiving an adjustment message sent by a user for scheme details, and acquiring adjustment requirements in the adjustment message; determining adjusted scheme details based on the scheme details and the adjustment requirements, and outputting prompting words and description information of the adjusted scheme details; in response to receiving the confirmation message sent by the user to the adjusted solution details, the image generation module 710 is instructed to generate an image based on the user-confirmed image presentation solution and the user-confirmed solution details with the adjusted solution details as the user-confirmed solution details, and output the image through the output module 704; and in response to receiving an adjustment message sent by a user for the adjusted scheme details, iteratively executing the operation of acquiring the adjustment requirement in the adjustment message for the adjusted scheme details.

Optionally, in some implementations, the image generation module 710 is specifically an image generation model, and the acquisition module 702 is specifically configured to, when the image generation module 710 instructs to generate an image based on the user-confirmed image presentation scheme and the user-confirmed scheme details: inputting scheme information of the user-confirmed image display scheme and prompt words of the user-confirmed scheme details into an image generation model so that the image generation model generates an image based on the scheme information of the user-confirmed image display scheme and the prompt words of the user-confirmed scheme details; an image output by the image generation model is received.

Optionally, in some of these implementations, the image generation model is specifically for: generating an initial image based on the image display scheme confirmed by the user, and adjusting the initial image based on the prompting words of the scheme details confirmed by the user to obtain an optimized image.

Optionally, referring back to fig. 8, in still another embodiment of the image generating apparatus of the present disclosure, it may further include: and a display module 714 for displaying the image. Accordingly, the output module 704 is specifically configured to output an image in response to the obtaining module 702 receiving an output instruction message from the user to the image displayed by the display module 714. The obtaining module 702 is further configured to: in response to receiving the first modification message of the user to the image, determining a first image modification scheme based on the first modification message, the first image modification scheme including a modification scheme of the image presentation scheme and/or a modification scheme of scheme details, and outputting, by the output module 704, description information of the first image modification scheme; in response to receiving a confirmation message sent by the user for the first image modification scheme, modifying the user-confirmed image presentation scheme and/or the user-confirmed scheme details based on the first image modification scheme, and instructing the image generation module 710 to generate an image with the modified image presentation scheme and the modified scheme details; in response to receiving a second modification message sent by the user for the image modification scheme, iteratively performing an operation of determining the first image modification scheme based on the first modification message with the second modification message as a new first modification message.

Optionally, in some implementations, the image generating task includes task text that requires generating an image.

Optionally, in other implementations, the image generating task includes: a base image and task text for the base image. Accordingly, the image generating apparatus of the above embodiment may further include: the image recognition module 716 is configured to perform image recognition on the base image to obtain an image recognition result. The acquiring module 702 is specifically configured to determine an image display scheme based on the image generation requirement and the image recognition result. The image generation module 710 is specifically configured to generate an image based on the base image and the user-confirmed image presentation scheme.

Fig. 9 is a schematic structural view of a further embodiment of the image generating apparatus of the present disclosure. The image generating apparatus of this embodiment can be used to implement the above-described image generating method embodiments of the present disclosure. As shown in fig. 9, the image generating apparatus of this embodiment includes: an interaction module 802, a language model 804, and an image generation model 806. Wherein:

the interaction module 802 is configured to obtain an image generation task input by a user, invoke the language model 804 to determine a requirement prompt message corresponding to the image generation task, and output the requirement prompt message; acquiring a demand reply message input by a user based on a demand prompt message, calling a language model 804 to determine an image generation demand based on the demand reply message, determining an image display scheme and description information of the image display scheme based on the image generation demand, and outputting the description information of the image display scheme; in response to receiving a user confirmation message for the image presentation scheme, the image generation model 806 is invoked to generate an image based on the user-confirmed image presentation scheme and output the image.

The language model 804 is used for determining a requirement prompt message corresponding to the image generating task, determining an image generating requirement based on the requirement reply message, and determining an image display scheme and description information of the image display scheme based on the image generating requirement.

An image generation model 806 for generating an image based on the user-validated image presentation scheme.

Optionally, in some implementations, the language model 804 is further configured to determine a requirement hint message corresponding to the image generating task. Accordingly, the interaction module 802 is specifically configured to output a demand prompt message corresponding to the image generation task.

Optionally, in some implementations, the language model 804 is specifically configured to determine at least one image presentation scheme based on the image generation requirement, where each image presentation scheme in the at least one image presentation scheme may include, for example, but not limited to, any one or more of the following scheme information: the method comprises the steps of integrally describing information of the image, image subjects, image presentation environments, main visual elements and presentation styles. Accordingly, the interaction module 802 is specifically configured to output the description information of the at least one image display scheme, and receive a confirmation message sent by the user for the target image display scheme in the at least one image display scheme. The image generation model 806 is specifically configured to generate an image based on the target image presentation scheme.

Optionally, in some implementations, the interaction module 802 is further configured to receive a solution modification message sent by the user for a target image presentation solution of the at least one image presentation solution. Correspondingly, the language model 804 is further configured to obtain a scheme modification requirement in the scheme modification message, determine a modified image display scheme based on the target image display scheme and the scheme modification requirement, and output description information of the modified image display scheme through the interaction module 802; the interaction module 802 is further configured to receive a confirmation message sent by the user for the modified image display scheme. Accordingly, the language model 804 is specifically configured to take the modified image presentation scheme as the image presentation scheme confirmed by the user, instruct the image generation model 806 to generate an image based on the image presentation scheme confirmed by the user, and output the image through the interaction module 802; in response to the interaction module 802 receiving a solution modification message sent by the user for the modified image presentation solution, iteratively executing an operation of obtaining a solution modification requirement in the solution modification message by replacing the target image presentation solution with the modified image presentation solution.

Optionally, in some implementations, in response to the interaction module 802 receiving a confirmation message of the user for the image display scheme, the language model 804 determines scheme details for the image display scheme confirmed by the user, and outputs a prompt word and description information of the scheme details; in response to the interaction module 802 receiving a user confirmation message of the solution details, the image generation model 806 is instructed to generate an image based on the user-confirmed image presentation solution and the user-confirmed solution details.

Optionally, in some implementations, the foregoing hint terms of the solution details may include, for example, but not limited to, any one or more of the following details and at least one attribute value of each of the details: scheme refinement and implementation of scheme refinement, image shape, image size, image resolution, image orientation, etc. When the scheme refinement point comprises a presentation style, the attribute value of the presentation style can comprise a text identifier of the presentation style and/or a style representative image.

Language model 804, specifically for: in response to the interaction module 802 receiving a confirmation message sent by the user based on the target attribute value of each item of detail determined by the at least one attribute value of each item of detail, the image generation model 806 is instructed to generate an image based on the user-confirmed image presentation scheme and the user-confirmed scheme details with the determined target attribute value of each item of detail as the user-confirmed scheme details.

Optionally, the language model 804 may be specifically further used to: in response to the interaction module 802 receiving an adjustment message sent by a user for the scheme details, acquiring an adjustment requirement in the adjustment message; determining adjusted scheme details based on the scheme details and the adjustment requirements, and outputting prompt words and description information of the adjusted scheme details through the interaction module 802; in response to the interaction module 802 receiving a confirmation message sent by the user to the adjusted solution details, taking the adjusted solution details as the solution details confirmed by the user, instructing the image generation model 806 to generate an image based on the image display solution confirmed by the user and the solution details confirmed by the user, and outputting the image through the interaction module 802; in response to the interaction module 802 receiving the adjustment message sent by the user for the adjusted solution details, the operation of obtaining the adjustment requirement in the adjustment message is iteratively performed for the adjusted solution details.

Optionally, in some implementations, the language model 804 is specifically configured to input the solution information of the user-confirmed image presentation solution and the prompt word of the user-confirmed solution details into the image generation model 806, so that the image generation model 806 generates an image based on the solution information of the user-confirmed image presentation solution and the prompt word of the user-confirmed solution details; the interaction module 802 is specifically configured to receive an image output by the image generation model 806.

Optionally, in some implementations, the image generation model 806 is specifically configured to generate an initial image based on the image presentation scheme confirmed by the user, and adjust the initial image based on the prompt word of the scheme detail confirmed by the user to obtain the optimized image.

Optionally, in some implementations, the interaction module 802 is further configured to display an image; and outputting the image in response to receiving an output instruction message of the user for the image; and receiving a first modification message of the image from the user; accordingly, the language model 804 is further configured to determine a first image modification scheme based on the first modification message, where the first image modification scheme includes a modification scheme of the image presentation scheme and/or a modification scheme of the scheme details, and output, by the interaction module 802, description information of the first image modification scheme. The interaction module 802 is further configured to receive a confirmation message sent by the user for the first image modification scheme; accordingly, language model 804 is further configured to modify the user-validated image presentation scheme and/or the user-validated scheme details based on the first image modification scheme and instruct image generation model 806 to generate an image with the modified image presentation scheme and the scheme details. The interaction module 802 is further configured to receive a second modification message sent by the user for the image modification scheme; accordingly, the language model 804 is further configured to iteratively perform an operation of determining the first image modification scheme based on the first modification message with the second modification message as a new first modification message.

Optionally, in other implementations, the image generating task includes a base image and task text for the base image. Fig. 10 is a schematic structural view of still another embodiment of the image generating apparatus of the present disclosure. As shown in fig. 10, the image generating apparatus of this embodiment further includes, on the basis of the embodiment shown in fig. 9: and the image recognition module 808 is used for performing image recognition on the basic image in the image generation task to obtain an image recognition result. Accordingly, the language model 804 is specifically configured to determine an image display scheme based on the image generation requirement and the image recognition result; the image generation model 806 is specifically used to generate an image based on the base image and the user-validated image presentation scheme.

The image generating device and the image generating method in the embodiment of the disclosure correspond to each other in specific implementation, and the corresponding contents can be referred to each other, so that no detailed description is given.

In addition, the embodiment of the disclosure also provides an electronic device, which comprises:

a memory for storing a computer program;

and a processor, configured to execute the computer program stored in the memory, and when the computer program is executed, implement the image generating method according to any one of the embodiments of the disclosure.

Fig. 11 is a schematic structural view of an application embodiment of the electronic device of the present disclosure. Next, an electronic device according to an embodiment of the present disclosure is described with reference to fig. 11. The electronic device may be either or both of the first device and the second device, or a stand-alone device independent thereof, which may communicate with the first device and the second device to receive the acquired input signals therefrom.

As shown in fig. 11, the electronic device includes one or more processors and memory.

The processor may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in the electronic device to perform the desired functions.

The memory may store one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or nonvolatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program products may be stored on the computer readable storage medium that can be run by a processor to implement the image generation methods and/or other desired functions of the various embodiments of the present disclosure described above.

In one example, the electronic device may further include: input devices and output devices, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown).

In addition, the input device may include, for example, a keyboard, a mouse, and the like.

The output device may output various information including the determined distance information, direction information, etc., to the outside. The output device may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.

Of course, only some of the components of the electronic device relevant to the present disclosure are shown in fig. 11 for simplicity, components such as buses, input/output interfaces, and the like being omitted. In addition, the electronic device may include any other suitable components depending on the particular application.

In addition to the methods and apparatus described above, embodiments of the present disclosure may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in an image generation method according to various embodiments of the present disclosure described in the above section of the present description.

The computer program product may write program code for performing the operations of embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.

Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium, having stored thereon computer program instructions, which when executed by a processor, cause the processor to perform steps in an image generation method according to various embodiments of the present disclosure described in the above section of the present description.

The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

The basic principles of the present disclosure have been described above in connection with specific embodiments, however, it should be noted that the advantages, benefits, effects, etc. mentioned in the present disclosure are merely examples and not limiting, and these advantages, benefits, effects, etc. are not to be considered as necessarily possessed by the various embodiments of the present disclosure. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, since the disclosure is not necessarily limited to practice with the specific details described.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, so that the same or similar parts between the embodiments are mutually referred to. For system embodiments, the description is relatively simple as it essentially corresponds to method embodiments, and reference should be made to the description of method embodiments for relevant points.

The block diagrams of the devices, apparatuses, devices, systems referred to in this disclosure are merely illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.

The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present disclosure may also be implemented as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.

It is also noted that in the apparatus, devices and methods of the present disclosure, components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered equivalent to the present disclosure.

The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the disclosure to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims

1. An image generation method, comprising:

acquiring an image generation task input by a user;

2. The method of claim 1, wherein after acquiring the user-entered image generation task, further comprising:

determining a demand prompt message corresponding to the image generation task;

outputting a demand hint message comprising:

3. The method of claim 1, wherein the demand hint message includes any one or more of the following information related to the image generation task: task type, target audience, expression topic, style, color, shooting angle, brightness effect, expressed visual sensation, expressed emotion, expressed information, and expressed environment.

4. The method of claim 1, wherein determining an image presentation scheme based on the image generation requirement and outputting description information of the image presentation scheme, comprises:

5. The method of claim 4, further comprising:

6. The method of any of claims 1-5, wherein generating an image based on the user-validated image presentation scheme in response to receiving a user confirmation message for the image presentation scheme comprises:

7. The method of claim 6, wherein the reminder for the scheme detail includes any one or more of the following and at least one attribute value for each detail: scheme refinement points and implementation modes of the scheme refinement points, image shape, image size, image resolution and image direction;

8. The method of claim 7, wherein when the solution refinement point includes a presentation style, the attribute value of the presentation style includes at least one of a textual identification of the presentation style and a style representation image.

9. The method of claim 6, wherein generating an image based on the user-validated image presentation scheme in response to receiving a user confirmation message for the image presentation scheme, further comprises:

10. The method of claim 6, wherein generating an image based on the user-validated image presentation scheme and the user-validated scheme details comprises:

and receiving an image output by the image generation model.

11. The method of claim 10, wherein the image generation model generates an image based on the plan information of the user-confirmed image presentation plan and the hint words of the user-confirmed plan details, comprising:

12. The method of claim 6, wherein after generating the image based on the user-validated image presentation scheme, further comprising:

displaying the image;

in response to receiving a first modification message of a user to the image, determining a first image modification scheme based on the first modification message, the first image modification scheme including at least one of a modification scheme of an image presentation scheme and a modification scheme of scheme details, and outputting description information of the first image modification scheme;

In response to receiving a confirmation message sent by a user for the first image modification scheme, modifying at least one of a user-confirmed image display scheme and user-confirmed scheme details based on the first image modification scheme, and generating an image according to the modified image display scheme and the modified scheme details;

outputting the image, comprising:

13. The method of any of claims 1-5, wherein the image generation task comprises task text that requires an image to be generated.

14. The method of any of claims 1-5, wherein the image generation task comprises: a base image and a task text for the base image;

15. An image generating apparatus comprising:

the output module is used for outputting a demand prompt message;

The output module is also used for outputting the image.

16. An image generating apparatus comprising:

17. An electronic device, comprising:

a memory for storing a computer program product;

a processor for executing a computer program product stored in the memory, which, when executed, implements the method of any of claims 1-14.

18. A computer readable storage medium having stored thereon computer program instructions which, when executed by a processor, implement the method of any of claims 1-14.

19. A computer program product comprising computer program instructions which, when executed by a processor, implement the method of any of claims 1-14.