CN117992169A

CN117992169A - Plane design display method based on AIGC technology

Info

Publication number: CN117992169A
Application number: CN202410182259.6A
Authority: CN
Inventors: 史明; 周晶璇
Original assignee: Shanghai Banding Network Technology Co ltd
Current assignee: Shanghai Banding Network Technology Co ltd
Priority date: 2024-02-19
Filing date: 2024-02-19
Publication date: 2024-05-07

Abstract

The invention relates to the technical field of AIGC, in particular to a planar design display method based on AIGC technology, which comprises the following steps: inputting object features and appearance features, inputting dynamic features, analyzing the object features by using an encoder, outputting the results by using a decoder, analyzing the appearance features and the dynamic features by using an encoder, fusing the analysis results of the two, and outputting the results by using the decoder. Compared with the traditional display mode, the invention can generate the content in real time according to the requirements and descriptions of operators, not just simply play the manufactured video, which means that the display device is not limited to the content prepared in advance any more and can generate the content automatically according to the actual demands of the operators.

Description

Plane design display method based on AIGC technology

Technical Field

The invention relates to the technical field of AIGC, in particular to a plane design display method based on AIGC technology.

Background

ARTIFICIAL INTELLIGENCE GENERATED ContentAIGC techniques refer to the generation of various forms of content, such as articles, music, images, video, etc., using artificial intelligence algorithms and techniques that train large amounts of data using artificial intelligence models, which can then generate new content based on the conditions and requirements entered.

The conventional display mode generally copies the prefabricated video to the platform for display, relies on manual operation to a great extent, and mainly has the following problems:

1. Because the traditional display mode mainly depends on the pre-manufactured video to play, the display device cannot generate content based on real-time images or descriptions of operators, the intelligent degree of the mode is low, and the personalized requirements of different users cannot be met;

2. the traditional display mode cannot generate the content through real-time adjustment and according to actual conditions, so that the display mode cannot adapt to the display requirements of different scenes;

3. Under the traditional display mode, an operator must make videos in advance and input the videos into the display device, however, complicated manual operation is required for control and adjustment in the display device, so that the difficulty and complexity of the operation are increased, and the convenience of a display platform is reduced;

Therefore, there is an urgent need for a planar design display method based on AIGC technology, which solves the above problems.

Disclosure of Invention

The invention aims to provide a plane design display method based on AIGC technology, which has the advantage of intelligence and solves the problems presented by the background technology.

In order to achieve the above purpose, the present invention provides the following technical solutions: a plane design display method based on AIGC technology comprises the following steps:

S1: inputting object features and appearance features, inputting dynamic features, analyzing the object features by using an encoder, outputting the results by using a decoder, analyzing the appearance features and the dynamic features by using an encoder, fusing analysis results of the two, outputting the results by using the decoder, fusing the input object features, the input appearance features and the input dynamic feature analysis results, and forming key entities at the moment; on the basis, a one-to-one encoder is used for respectively analyzing specific object features, specific appearance features and specific dynamic features, after the dynamic entity features are obtained, the model is spliced with original dynamic features, after key entity generation, the extracted image features are subjected to semantic enhancement under the action of feature fusion to obtain rich information, the interaction relationship between the key entity and the feature fusion is fused, at the moment, global visual information and the description of the whole language layer are connected, and finally the description is generated;

S2: inputting text information, and generating voice output by using a voice synthesis engine; further, voice output and text input can be performed by utilizing a voice synthesis mode, recorded voice fragments are prepared in advance, the voice fragments are connected according to the input text information, the text is converted into audible voice output technology through a computer algorithm, and the synthesized voice is output by voice cloning;

S3: inputting voice, recognizing the voice by a voice recognizer, generating continuous texts by a recurrent neural network through learning context dependency relations of sequence data, and successfully generating the texts by taking the output of the last time step as the input of the current time step;

s4: after the text, the image and the voice are generated, an encoder and two decoders are used for analyzing the text, the image and the voice, the encoder is responsible for encoding the characteristics of the text, the image and the voice into a group of characteristics which face each other, the decoders respectively learn the association between static information and dynamic information, learn the relationship between context information, and carry out cross matching and fusion on the information of the text, the image and the voice, so that the visual representation is finally obtained;

S5: the method comprises the steps of collecting information of texts, voices and images by using a collecting module, processing the collected information under the action of a processing module, dividing the texts, the voices and the images into different areas after processing, analyzing the texts, the voices and the images by using a decoder when the texts, the voices and the images are generated, fusing the information by using a generating module, generating videos or images at the moment, and displaying by using a AIGC platform after the videos are generated.

In the invention, in step S2, the specific steps of converting input characters into voice are as follows:

S2.1.1: preprocessing the input text, including removing punctuation marks, marking into words or characters, processing numbers and specific abbreviations, etc.;

s2.1.2: selecting a speech synthesis model suitable for task requirements, wherein the step comprises converting text into phoneme sequences or using word phoneme sequences, and analyzing language characteristics such as syllables, phonemes, tones and the like;

s2.1.3: training an acoustic model using a large amount of text and corresponding speech data, the model being capable of learning a correlation between text and audio, and acoustic features of the audio;

s2.1.4: generating a voice waveform through an acoustic model based on the input text and the model obtained through training to generate synthetic voice;

S2.1.5: some post-processing operations are performed on the generated speech to improve the quality and naturalness of the speech, including adjusting pitch, volume, speech rate, etc.

In the invention, in step S2, the specific steps of inputting words and synthesizing voice by voice fragments are as follows:

S2.2.1: a speech database, also called a speech library or speech unit library, is built in advance, which contains a large number of recorded speech, which covers various possible phonemes, syllables, words and phrases, etc.;

S2.2.2: converting the input text into a corresponding sequence of phonemes, which may be accomplished by using a tool for text-to-speech conversion or internal processing of a speech synthesis system;

S2.2.3: selecting, from the speech library, an appropriate speech segment corresponding to each phoneme, each speech segment generally corresponding to a phoneme or a group of phonemes, based on the sequence of phonemes;

S2.2.4: the selected voice fragments are spliced together according to the sequence to form continuous voice output, wherein the voice output comprises the proper adjustment and smoothing treatment of the voice fragments so as to ensure that the spliced voice is smooth and natural;

S2.2.5: and carrying out some post-processing operations on the generated voice output, such as adjustment of tone, volume and speech speed, so as to improve the quality and naturalness of the voice.

Preferably, in step S3, the speech clone includes a speech feature extraction module, and the speech feature extraction module extracts synthesized speech, and processes the smoothing of the speech by using a signal processing technology, so as to output by using the speech clone.

In the present invention, in step S3, before text generation, the method for detecting text output is as follows:

s3.1: firstly, setting a judgment of whether a generated result is fit or not, after the text is generated, judging whether the generated result is matched with a display theme or not, if yes, re-editing and re-generating the text;

s3.2: then, whether the sentence of the generated result is smooth or not is detected, if yes, the sentence is output, otherwise, the sentence is edited again, and the text is regenerated.

The beneficial effects are that the technical scheme of the application has the following technical effects: compared with the traditional display mode, the application can generate the content in real time according to the requirements and descriptions of operators, not just simply play the manufactured video, which means that the display device is not limited to the content prepared in advance any more and can generate the content automatically according to the actual demands of the operators;

The operator can instruct the display device to generate the content meeting the requirements of the operator through inputting various forms of information such as pictures, texts and the like, in addition, the operator can interact with the display device in a voice instruction mode without complicated manual operation, the display device can recognize and understand voice commands of the operator, the display content is adjusted according to the instructions or related information is provided, and the natural and convenient interaction mode greatly improves user experience, so that the display process is more intelligent and personalized.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

FIG. 1 is a flow chart of a technical system of the structure AIGC of the present invention;

FIG. 2 is a functional block diagram of text generation in accordance with the present invention;

FIG. 3 is a functional block diagram of speech generation according to the present invention;

Fig. 4 is a functional block diagram of image generation of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and in order to better understand the technical content of the present invention, specific embodiments are specifically described below with reference to the accompanying drawings. Aspects of the invention are described in this disclosure with reference to the drawings, in which are shown a number of illustrative embodiments. It should be appreciated that the various concepts and embodiments described above, as well as those described in more detail below, may be implemented in any of a wide variety of ways. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1 to 4: the embodiment provides a plane design display method based on AIGC technology, which comprises the following steps:

Specifically, in step S3, the speech clone includes a speech feature extraction module, and the speech feature extraction module extracts synthesized speech, and processes the smoothing of the speech by using a signal processing technology, so as to output by using the speech clone.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions.

While the invention has been described with reference to preferred embodiments, it is not intended to be limiting. Those skilled in the art will appreciate that various modifications and adaptations can be made without departing from the spirit and scope of the present invention. Accordingly, the scope of the invention is defined by the appended claims.

Claims

1. A plane design display method based on AIGC technology is characterized in that: the method comprises the following steps:

2. The planar design display method based on AIGC technology according to claim 1, wherein: in step S2, the specific step of converting the input text into voice is as follows:

3. The planar design display method based on AIGC technology according to claim 1, wherein: in step S2, the specific steps of inputting text and speech fragment synthesized speech are as follows:

4. The planar design display method based on AIGC technology according to claim 1, wherein: in step S2, the voice clone includes a voice feature extraction module, the voice feature extraction module extracts synthesized voice, and the signal processing technology is used to process the smoothing of the voice, so as to output by using the voice clone.

5. The planar design display method based on AIGC technology according to claim 1, wherein: in step S3, before the text is generated, the method for detecting the text output is as follows: