CN117874239B

CN117874239B - Content generation method, device, equipment and storage medium

Info

Publication number: CN117874239B
Application number: CN202410270954.8A
Authority: CN
Inventors: 徐雪阳; 刘煜宏; 刘威; 郭春超; 石惠文
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2024-03-11
Filing date: 2024-03-11
Publication date: 2024-06-11
Anticipated expiration: 2044-03-11
Also published as: CN117874239A

Abstract

The application discloses a content generation method, a content generation device, content generation equipment and a storage medium, and relates to the technical field of AI. The method comprises the following steps: acquiring an input text; determining a first classification category to which the input text belongs from a plurality of classification categories, wherein the plurality of classification categories comprise at least one violation category and one compliance category, the violation category refers to the category to which the violation content contained in the input text belongs, and the compliance category refers to the fact that the violation content is not contained in the input text; determining a prompt word corresponding to the first classification category to obtain a first prompt word, wherein the first prompt word is used for indicating a content generation requirement corresponding to the input text; and generating output content corresponding to the input text through the generation model according to the input text and the first prompt word. According to the application, the corresponding prompt words are designed aiming at the classification category to which the input text belongs, so that the generation model is guided to generate forward output content, and the safety and compliance of content generation are enhanced.

Description

Content generation method, device, equipment and storage medium

Technical Field

The present application relates to the technical field of AI (ARTIFICIAL INTELLIGENCE ), and in particular, to a content generation method, apparatus, device, and storage medium.

Background

With the advent of the era of artificial intelligence Content (ARTIFICIAL INTELLIGENCE GENERATED Content, AIGC) a number of generative large models have been created, each day of which produces rich Content from the user's input text.

Related art includes a keyword-based content filtering technique that relies primarily on a pre-set list of keywords and regular expressions to identify and filter inappropriate content in the input text. The keyword list typically contains a list of offensive content, offensive words, or compound phrases that are deemed not to be present. When the text input by the user or the content generated by the model contains keywords, the system triggers a filtering mechanism, and the content generated by the large model can be cut off or the session between the large model and the user can be terminated according to the specific situation.

However, the above method relies on predefined keyword lists, which often cannot be updated in time, are relatively stiff, and may cause errors in the identification of normal and inappropriate content.

Disclosure of Invention

The embodiment of the application provides a content generation method, a device, equipment and a storage medium. The technical scheme provided by the embodiment of the application comprises the following aspects.

According to an aspect of an embodiment of the present application, there is provided a content generation method including: acquiring an input text; determining a first classification category to which the input text belongs from a plurality of classification categories, wherein the plurality of classification categories comprise at least one violation category and one compliance category, the violation category refers to a category to which violation content contained in the input text belongs, and the compliance category refers to that no violation content is contained in the input text; determining a prompt word corresponding to the first classification category to obtain a first prompt word, wherein the first prompt word is used for indicating a content generation requirement corresponding to the input text; and generating output content corresponding to the input text through a generation model according to the input text and the first prompt word.

According to an aspect of an embodiment of the present application, there is provided a content generating apparatus including: text acquisition means for acquiring an input text; the classification determining module is used for determining a first classification class to which the input text belongs from a plurality of classification classes, wherein the plurality of classification classes comprise at least one violation class and one compliance class, the violation class is a class to which the violation content contained in the input text belongs, and the compliance class is a class to which the violation content is not contained in the input text; the prompt word determining module is used for determining the prompt word corresponding to the first classification category to obtain a first prompt word, wherein the first prompt word is used for indicating a content generation requirement corresponding to the input text; and the content output module is used for generating output content corresponding to the input text according to the input text and the first prompt word through a generation model.

According to an aspect of an embodiment of the present application, there is provided a computer apparatus including a processor and a memory in which a computer program is stored, the computer program being loaded and executed by the processor to implement the above-described content generating method.

According to an aspect of an embodiment of the present application, there is provided a computer-readable storage medium having stored therein a computer program loaded and executed by a processor to implement the above-described content generating method.

According to an aspect of an embodiment of the present application, there is provided a computer program product comprising a computer program loaded and executed by a processor to implement the above-described content generation method.

The technical scheme provided by the embodiment of the application can bring the following beneficial effects: determining a first classification category to which the input text belongs from a plurality of classification categories, determining a first prompt word corresponding to the first classification category, and generating output content corresponding to the input text according to the input text and the first prompt word through a generation model. Compared with the prior art that illegal contents in the input text are identified and then the contents generated by the generation model are truncated, the technical scheme provided by the application designs the corresponding prompt words aiming at the classification category to which the input text belongs, and because the prompt words play a role in indicating the content generation requirement corresponding to the input text, after the input text and the prompt words are sent into the generation model, the generation model can respond to the input text containing the illegal contents, and compared with the traditional content filtering method, the interactive experience of a user is improved, and the possibility of identifying normal contents and improper contents is reduced. The prompting words can guide the generation model to generate forward output content to the greatest extent, so that the potential of the generation model is fully exerted, the risk of generating illegal content is reduced, and the safety and compliance of content generation are enhanced.

Drawings

FIG. 1 is a schematic diagram of an embodiment of the present application.

Fig. 2 is a flowchart of a content generation method provided in one embodiment of the present application.

FIG. 3 is a flow chart of a classification model provided by one embodiment of the application to determine the classification category to which the input text belongs.

FIG. 4 is a schematic diagram of correspondence between a plurality of classification categories and hint words according to one embodiment of the present application.

FIG. 5 is an illustration of a retraining process for a classification model provided by one embodiment of the application.

FIG. 6 is a schematic diagram of a reclassification process for a classification model provided by one embodiment of the application.

FIG. 7 is a schematic diagram of a classification model-based content optimization process provided by one embodiment of the application.

Fig. 8 is a block diagram of a content generating apparatus provided in one embodiment of the present application.

Fig. 9 is a block diagram of a computer device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings.

Artificial intelligence is the theory, method, technique and application system that uses a digital computer or a digital computer-controlled machine to simulate, extend and expand human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Artificial intelligence, i.e. research on design principles and implementation methods of various intelligent machines, enables the machines to have functions of sensing, reasoning and decision.

The artificial intelligence technology is a comprehensive subject, and relates to the technology with wide fields, namely the technology with a hardware level and the technology with a software level. Artificial intelligence infrastructure technologies generally include, for example, sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, pre-training model technologies, operation/interaction systems, mechatronics, and the like. The pre-training model is also called a large model and a basic model, and can be widely applied to all large-direction downstream tasks of artificial intelligence after fine adjustment. The artificial intelligence software technology mainly comprises a computer vision technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and other directions.

Machine learning (MACHINE LEARNING, ML for short) is a multi-domain interdisciplinary, involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, and the like. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like. The pre-training model is the latest development result of deep learning, and integrates the technology.

A large language model (Large Language Model, abbreviated LLM) is an artificial intelligence algorithm based on deep learning techniques with the goal of letting the computer understand and generate natural language. It learns the structure and regularity of a language by analyzing a large amount of language data such as text, speech or images, and uses this knowledge to accomplish various natural language processing tasks such as machine translation, speech recognition, text classification, question-answering systems, etc. Large language models typically use a transform architecture in deep learning to model text sequences in order to understand context and semantics. Its training process typically involves a large amount of data and computing resources, such as a large corpus and a high performance computing platform. In the training process, the large language model gradually learns the characteristics and rules of the language, and forms understanding and expression capability of the language.

The transducer architecture is a deep learning model that employs a self-attention mechanism that can be assigned different weights depending on the importance of the various parts of the input data. The architecture is mainly used in the field of natural language processing and Computer Vision (CV). The architecture generally includes Self-Attention (Self-Attention), multi-Head Attention (Multi-Head Attention), position coding (Positional Encoding), residual connection and normalization (Add & Norm), feed-Forward Network (Feed-Forward Network), position-by-Position Feed-Forward Network (Position-with-Forward Network), and the like, which constitute the encoder and decoder.

A Pre-training Model (PTM for short), also called a matrix Model and a large Model, refers to a deep neural network (Deep Neural Network, DNN for short) with large parameters, trains the Model on massive unlabeled data, utilizes the function approximation capability of the large-Parameter DNN to enable the PTM to extract common features on the data, and is suitable for downstream tasks through fine Tuning, parameter-EFFICIENT FINE-Tuning (PEFT), prompt fine Tuning (prompt-Tuning) and other technologies. Therefore, the pre-training model can achieve ideal effects in a small sample (Few-shot) or Zero sample (Zero-shot) scene. PTM can be classified according to the data modality of processing into a language model (ELMO, BERT, GPT), a visual model (swin-transducer, viT, V-MOE), a speech model (VALL-E), a multi-modal model (ViBERT, CLIP, flamingo, gato), etc., wherein a multi-modal model refers to a model that builds a representation of two or more data modality features. The pre-training model is an important tool for outputting artificial intelligence to generate content, and can also be used as a general interface for connecting a plurality of specific task models.

With research and advancement of artificial intelligence technology, research and application of artificial intelligence technology is being developed in various fields, such as common smart home, smart wearable devices, virtual assistants, smart speakers, smart marketing, unmanned, autopilot, unmanned, digital twin, virtual, robotic, artificial intelligence generation content, conversational interactions, smart medicine, smart customer service, game AI, virtual Reality (VR), augmented Reality (Augmented Reality AR), and the like, and it is believed that with the development of technology, artificial intelligence technology will be applied in more fields and play an increasingly important role.

The technical scheme of the application mainly relates to a machine learning technology in an artificial intelligence technology, and mainly relates to a training and using process of a classification model.

Referring to fig. 1, a schematic diagram of an implementation environment of an embodiment of the present application is shown. The implementation environment of the scheme can be realized as a content generation system. The implementation environment of the scheme can comprise: a terminal device 10 and a server 20.

The number of terminal devices 10 may be one or more. The terminal device 10 may be, but is not limited to, an electronic device such as a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. The terminal device 10 may be provided with a client of a target application program for realizing a function of generating output contents corresponding to an input text from a text input by a user. Alternatively, the application may be an application that needs to be downloaded and installed, or may be a point-and-use application, which is not limited in the embodiment of the present application.

The server 20 is used to provide a background service for a client of the terminal device 10 that installs a running target application. For example, the server 20 may be a background server of the application program described above. The server 20 may be a separate physical server, a server cluster composed of a plurality of servers, or a cloud computing service center. Alternatively, the server 20 provides background services for the application programs in the plurality of terminal apparatuses 10 at the same time. The terminal device 10 and the server 20 can communicate with each other via a network.

In the embodiment of the present application, the execution body of each step may be a computer device, and the computer device refers to an electronic device having data computing, processing and storage functions. The computer device may be, for example, the terminal device 10 as in fig. 1, or the server 20.

In the embodiment of the application, a user inputs a text in a target application program, a first classification category to which the input text belongs is determined from a plurality of classification categories through a classification model, and a prompt word corresponding to the first classification category is obtained according to the corresponding relation between the plurality of classification categories and the prompt word. And generating output content corresponding to the input text by the generation model according to the input text and the prompt words corresponding to the first classification category. Wherein the classification model is a model obtained through pre-training. The generated model may be any publicly available large language model, such as a natural language model based on a transducer structure obtained by training a large amount of data, for example, a sample level of hundreds of millions or more, which is not limited in the present application.

Referring to fig. 2, a flowchart of a content generating method according to an embodiment of the present application is shown. The subject of execution of the steps of the method may be a computer device. The method may include at least one of the following steps 210-240.

Step 210, an input text is obtained.

The input text is used for representing the content generation requirement of the user, and the input text can be words, phrases or a section of descriptive words. The content generation requirement corresponding to the input text can be output content in a text form or output content in an image form. For example, if the expression form of the input text is "please answer the champion of the table tennis man, the content generation requirement corresponding to the input text is to generate the output content in the text form. If the expression form of the input text is' please output a champion prize-awarding picture of a table tennis man, the content generation requirement corresponding to the input text is output content in the form of a generated image.

The user types the input text in the client of the target application program, and the computer equipment obtains the input text through the client so as to generate output content corresponding to the input text.

Step 220, determining a first classification category to which the input text belongs from a plurality of classification categories, wherein the plurality of classification categories include at least one violation category and one compliance category, the violation category refers to a category to which the violation content contained in the input text belongs, and the compliance category refers to a category to which the violation content is not contained in the input text.

In some embodiments, the first classification category is determined by a classification model, the input text is input into the classification model, and the classification category to which the input text belongs, i.e., the first classification category, is output.

In some embodiments, the classification model includes a feature extractor and a feature arbiter. The flow chart of determining the classification category to which the input text belongs through the classification model can refer to the flow chart shown in fig. 3, and the feature extractor is used for extracting the features of the input text to obtain the feature information of the input text; and judging the classification category to which the input text belongs by a feature identifier according to the feature information of the input text, and obtaining a first classification category.

The feature extractor in the classification model comprises a text feature extractor, and the text feature extractor is used for extracting features of the input text to obtain feature information of the input text. The feature discriminator in the classification model is used for discriminating the classification category to which the input feature information belongs, the feature information of the input text is input into the feature discriminator, and the first classification category to which the input text belongs is output.

By extracting the characteristic information of the input text and adopting the characteristic discriminator to discriminate the classification category to which the input text belongs, the characteristic information of the input text can be subjected to category analysis, thereby ensuring the effectiveness and accuracy of category discrimination.

The plurality of classification categories comprise n violation categories and one compliance category, the n violation categories comprise violation categories, pornography categories, infringement categories and the like, the violation categories are used for indicating that the input text contains violation contents, and the compliance category means that the input text does not contain violation contents. If the first classification category is one of n violation categories, the input text is indicated to contain violation contents; if the first classification category is a compliance category, the input text is indicated to contain no illegal content. n is a positive integer.

The offensive content is preset inappropriate content, and is generally content related to harmful information such as illegal violations, pornography, unscrupulous, and the like. The technician presets a violation content list, wherein the violation content list comprises at least one violation content, and the violation content comprises, but is not limited to, a violation vocabulary, a combination phrase, descriptive words with violation meanings and the like.

In some embodiments, the feature discriminator detects the input text according to the feature information of the input text, if the input text contains the offending content, the class to which the offending content belongs is discriminated, and the offending class to which the offending content belongs is determined as the first classification class.

Detecting whether the input text contains the illegal contents or not according to a preset illegal content list by a feature discriminator, and if the feature information of the input text contains the feature information of at least one item of illegal contents in the illegal content list, indicating that the input text contains at least one item of illegal contents. And judging the category of the illegal content according to the characteristic information of at least one item of illegal content by a characteristic judging device to obtain the illegal category of the illegal content, wherein the first classified category is the illegal category of the illegal content.

In some embodiments, if no offending content is included in the input text, the compliance category is determined to be a first classification category.

If the feature information of any illegal content is not detected in the feature information of the input text, the input text does not contain the illegal content, and the first classification class is the compliance class.

Detecting whether the input text contains illegal contents or not, and further detecting the category to which the illegal contents belong when the input text contains the illegal contents, and determining the illegal category to which the illegal contents belong as a first classification category. By refining the violation categories to which the input text belongs, the input text can be subjected to content generation processing in a targeted manner aiming at the violation categories to which the input text belongs, so that output content meeting the requirements is generated, the safety and compliance of content generation are improved, and the risk of generating the violation content is reduced.

Step 230, determining a prompt word corresponding to the first classification category, and obtaining a first prompt word, where the first prompt word is used to indicate a content generation requirement corresponding to the input text.

The first prompting word is a prompting word corresponding to the first classification category, and can also be regarded as a prompting word corresponding to the input text.

The content generation requirement corresponding to the input text means that the generated output content does not contain illegal content corresponding to the first classification category. For example, if the first classification class is an illegal class, the prompt word corresponding to the illegal class is used for indicating that the generated output content does not contain illegal content. If the first classification category is pornography category, the prompting word corresponding to the pornography category is used for indicating that the generated output content does not contain pornography related content.

In some embodiments, according to the correspondence between the plurality of classification categories and the prompt words, the prompt words corresponding to the first classification category are determined, and the first prompt words are obtained.

The correspondence between a plurality of classification categories and hint words (text classification rules) may be as shown in fig. 4, where each classification category corresponds to one hint word, for example, a compliance category corresponds to hint word 1, an offensiveness category corresponds to hint word 2, a pornography category corresponds to hint word n, and an infringement category corresponds to hint word n+1. The hint words are used to indicate the content processing requirements of the corresponding classification category, i.e. the requirements are generated for the content of the input text.

If the first classification category is a rule violation category, for example, if the first classification category is a rule violation category, the prompt word 2 is used for indicating that the content corresponding to the input text has a content generation requirement that the output content accords with laws and regulations and standards of relevant countries and regions. For example, the prompt word 2 may directly include a content generation requirement for output content, and the prompt word 2 may also include a processing method for illegal content, for example, the processing method may be to mask or filter the illegal content, or may be to perform compliance processing on the illegal content.

If the first classification category is a compliance category, the prompt word 1 may be blank data, and the prompt word 1 may also be used to indicate that the input text belongs to the compliance category, for example, the prompt word 1 is used to indicate that the input text does not contain the offending content.

Based on the preset corresponding relation between a plurality of classification categories and the prompt words, the prompt words corresponding to the first classification categories are determined according to the first classification category to which the input text belongs, so that the content requirements corresponding to the input text can be determined, the generation model can generate output content meeting the requirements according to the first prompt words, the safety and compliance of content generation are ensured as much as possible, and the risk of generating illegal content is reduced.

And 240, generating output content corresponding to the input text according to the input text and the first prompt word through a generation model.

And splicing the input text and the first prompt word, inputting the spliced text into the generation model, and generating output content corresponding to the input text. Alternatively, the first prompt word may be added before the input text, as a prefix text of the input text, or may be added after the input text, as a suffix text of the input text, or may be added to a portion of the offensive content, as a supplemental text of the offensive content.

In some embodiments, output content in a text form corresponding to the input text is generated by a text generation model according to the input text and the first prompt.

The generation model comprises a text generation model and an image generation model, the generation model is used for judging the content generation requirement corresponding to the input text according to the input text and the first prompt word, and if the content generation requirement corresponding to the input text is output content in a generated text form, the text generation model is used for generating the output content in a text form corresponding to the input text according to the input text and the first prompt word.

Illustratively, at the AI text generation platform, a user types in an input text to generate output content in text form. By inputting the input text and the prompt words corresponding to the input text into the text generation model, the generated output content in the form of characters can be ensured to contain no illegal, unscrupulous, harmful elements and the like as much as possible. For example, if the user's input text contains offensive content implying pornography or law violations, the text generation model will modify or filter these offensive portions, ensuring the security of the final generated results. This approach not only promotes the security of content generation, but also ensures the free flow of creative content, rather than simply masking the user's request.

In some embodiments, output content in the form of an image corresponding to the input text is generated by an image generation model from the input text and the first prompt.

And if the content generation requirement corresponding to the input text is that the output content in the image form is generated, generating the output content in the image form corresponding to the input text by adopting an image generation model according to the input text and the first prompt word.

Illustratively, at the AI image generation platform, a user types in input text to generate output content in the form of an image. By inputting the input text and the hint word corresponding to the input text to the image generation model, the image generation model can be guided as much as possible to avoid generating output content in the form of an image with pornography, violence, or inappropriate elements. For example, if the user's input text contains offending content that suggests generating an improper image, the image generation model will modify or filter the offending portions, and the lead model will generate an output content in the form of a harmless or forward image. Thus, not only is the risk of generating improper images reduced, but also some infringing drawings are avoided, thereby avoiding legal and public opinion risks.

By setting the text generation model and the image generation model in the generation model, output content in a corresponding form can be generated according to content generation requirements corresponding to the input text, and the expandability of the generation model is improved on the basis of ensuring the safety of content generation.

According to the technical scheme provided by the embodiment of the application, the first classification category to which the input text belongs is determined from a plurality of classification categories, the first prompt word corresponding to the first classification category is determined, and the output content corresponding to the input text is generated according to the input text and the first prompt word by the generation model. Compared with the prior art that illegal contents in the input text are identified and then the contents generated by the generation model are truncated, the technical scheme provided by the application designs the corresponding prompt words aiming at the classification category to which the input text belongs, and because the prompt words play a role in indicating the content generation requirement corresponding to the input text, after the input text and the prompt words are sent into the generation model, the generation model can respond to the input text containing the illegal contents, and compared with the traditional content filtering method, the interactive experience of a user is improved, and the possibility of identifying normal contents and improper contents is reduced. The prompting words can guide the generation model to generate forward output content to the greatest extent, so that the potential of the generation model is fully exerted, the risk of generating illegal content is reduced, and the safety and compliance of content generation are enhanced.

In some embodiments, since the classification model is trained using only the input text as training data, the input text that is widely different in the public view may be considered as similar in feature information by the classification model in the feature space, thereby generating the possibility of misclassification. In the case where the input text contains the offending content, there is a possibility that an erroneous classification may be generated for the offending category to which the input text belongs, and thus, the output content containing the offending content may be generated. For example, if the rule-breaking category to which the input text belongs is a rule-breaking category, and the classification model misclassifies the input text into the pornography categories, the output content generated by the generation model will not contain pornography-related content, but may contain rule-breaking content. In the case where the input text contains offending content, the input text may be misclassified into a compliance category, and output content containing offending content may be generated. And because the classification model is not perceived for the output content generated by the input text in the generation model, the output content of the generation model can be returned to the classification model for training, and the classification model can be enabled to realize the difference between the input texts as correctly as possible by utilizing the difference between the output content.

The classification model used in the step 220 is a model after pre-training, and the classification model is trained by using the original training data. Each training sample of the original training data comprises a sample text and a labeling classification category, wherein the sample text is an input text used for training a classification model, and the labeling classification category is a classification category to which the pre-labeled sample text belongs. Obtaining a classification category to which the sample text belongs through an untrained classification model according to the sample text, calculating a loss function value according to the difference between the classification category to which the sample text belongs and the labeling classification category, and adjusting parameters of the untrained classification model with the aim of minimizing the loss function value to obtain the classification model. The loss function may be any one of a mean square error loss function, a cross entropy loss function, a log likelihood loss function, an exponential loss function, and the like, which is not limited in the present application.

The retraining process of the classification model can refer to fig. 5, on the basis of the pre-trained classification model, a classification model is adopted to obtain a first classification category to which the input text belongs according to the input text, a prompt word corresponding to the first classification category is determined, and output content corresponding to the input text is generated according to the input text and the first prompt word through a generation model. And refluxing the output content to the training data to obtain the refluxed training data, wherein each training sample of the refluxed training data comprises a sample text, output content corresponding to the sample text and a labeling classification category, the sample text is an input text used for training a classification model, the generated content corresponding to the sample text is an output content generated by the generation model according to the sample text and a prompt word corresponding to the sample text, and the labeling classification category is a classification category to which the pre-labeled sample text belongs. The classification model is retrained by training data after backflow, so that the classification capacity of the classification model is further improved.

The retraining process of the classification model may further include at least one of steps 250-260 (not shown).

Step 250, determining a second classification category from the plurality of classification categories according to the input text and the output content through the classification model.

In the retraining process of the classification model, the input data of the classification model comprises the input text and the output content of the generation model, and the output data of the classification model comprises the second classification category.

The second classification category is a classification category obtained by reclassifying the classification category to which the input text belongs based on the input text and the output content by the classification model. The second classification category may be the same classification category as the first classification category or may be a different classification category than the first classification category.

Step 250 includes at least one sub-step of steps 251-253 (not shown).

In step 251, feature extraction is performed on the input text and the output content through a feature extractor in the classification model, so as to obtain feature information of the input text and feature information of the output content.

And extracting the characteristics of the input text by a text characteristic extractor to obtain the characteristic information of the input text.

The feature extractor in the classification model further includes an image feature extractor, and if the text-form output content is generated by the text generation model in the step 240, the text-form output content is subjected to feature extraction by the text feature extractor, so as to obtain feature information of the output content. If the output content in the image form is generated by the image generation model in the above step 240, the image feature extractor performs feature extraction on the output content in the image form to obtain feature information of the output content.

And step 252, obtaining fusion characteristics according to the characteristic information of the input text and the characteristic information of the output content.

The fusion features are input data of a feature discriminator of the classification model in the retraining process of the classification model.

In some embodiments, the feature information of the input text and the feature information of the output content may be spliced to obtain a fusion feature, i.e. the feature information of the output content is used to make the classification model as correctly as possible aware of the differences between the input texts.

In some embodiments, determining a difference feature of the feature information of the input text relative to the feature information of the output content; and obtaining fusion characteristics according to the characteristic information of the input text, the characteristic information of the output content and the difference characteristics.

The difference feature is difference information of feature information of the input text relative to feature information of the output content, and the difference feature can be used to make the classification model recognize the difference between the input texts as correctly as possible. And splicing the characteristic information of the input text, the characteristic information of the output content and the difference characteristic to obtain a fusion characteristic.

The classification model is further aware of the difference between the input texts by adopting the difference characteristic of the characteristic information of the input texts relative to the characteristic information of the output content, so that the classification model judges the classification category to which the input texts belong again, the accuracy of classification judgment is improved, and the probability of misclassification is reduced.

For determining the difference characteristic of the characteristic information of the input text relative to the characteristic information of the output content, in some embodiments, the characteristic extractor is used for extracting the characteristic of the first prompt word to obtain the characteristic information of the first prompt word; splicing the characteristic information of the input text and the characteristic information of the first prompt word to obtain splicing characteristics; and determining the distance between the spliced characteristic and the characteristic information of the output content to obtain a difference characteristic.

And extracting the characteristics of the first prompt word through a text characteristic extractor to obtain the characteristic information of the first prompt word. The stitching feature is used to indicate the feature information of the input data of the generated model in the above step 240, and the feature information of the output content is used to indicate the feature information of the output data of the generated model, so that the difference feature may be understood as the difference information of the feature information of the input data of the generated model relative to the feature information of the output data.

The above-mentioned respective feature information may be in the form of vectors.

Alternatively, the distance between the spliced feature and the feature information of the output content may be determined by a cosine similarity, a euclidean distance, a manhattan distance, or the like. For example, if the cosine similarity is used to calculate the distance between the spliced feature and the feature information of the output content, the cosine distance between each element in the spliced feature and each corresponding element in the feature information of the output content may be calculated, and the cosine distances between each element may be spliced to obtain the difference feature.

Alternatively, after the splicing feature is obtained, the splicing feature and the feature information of the output content may be mapped to the same feature space through a neural network, and after the dimension unification is performed on the splicing feature and the feature information of the output content, the distance between the splicing feature and the feature information of the output content is calculated, so as to obtain a difference feature, which may be shown in fig. 6. The neural network may be any one of a deep neural network, a convolutional neural network (Convolutional Neural Networks, abbreviated as CNN), and the like.

By splicing the characteristic information of the input text and the characteristic information of the first prompt word to obtain the spliced characteristic, the difference characteristic of the characteristic information of the input text relative to the characteristic information of the output content can be determined according to the distance between the spliced characteristic and the characteristic information of the output content, so that the difference characteristic can more accurately represent the difference between the input texts, the accuracy of the classification model in classification judgment is improved, and the probability of misclassification is reduced.

And step 253, judging the classification category to which the input text belongs by a feature judging device in the classification model according to the fusion feature to obtain a second classification category.

In some embodiments, whether the input text contains the illegal content is detected by the feature discriminator according to the fusion feature, if the input text contains the illegal content, the category to which the illegal content belongs is discriminated again according to the fusion feature, and the obtained illegal category is determined as the second classification category. If the input text does not contain illegal contents, the compliance class is determined to be a second classification class.

The classification model is used for determining the classification category to which the input text belongs from a plurality of classification categories again according to the input text and combining the output content, so that the accuracy of the classification model in category judgment can be improved, the probability of misclassification is reduced, the accuracy and the safety of content generation are improved, and the generation of the output content containing illegal content due to misclassification is avoided.

Fig. 6 shows a reclassification process of the classification model, in which feature extraction is performed on the input text, the first prompt word, and the output content by the feature extractor in the classification model, so as to sequentially obtain feature information of the input text, feature information of the first prompt word, and feature information of the output content. And splicing the characteristic information of the input text and the characteristic information of the first prompt word to obtain spliced characteristics, mapping the spliced characteristics and the characteristic information of the output content to the same characteristic space through a deep neural network, and calculating the distance between the spliced characteristics and the characteristic information of the output content after dimension unification of the characteristics to obtain difference characteristics. Obtaining fusion characteristics according to the characteristic information of the input text, the characteristic information of the output content and the difference characteristics; alternatively, the fusion feature may be obtained directly from the feature information of the input text and the feature information of the output content (not shown in fig. 6). The fusion features are input to a feature discriminator, and a second classification category is output.

Step 260, adjusting parameters of the classification model according to the difference between the second classification category and the labeling classification category to which the input text belongs to obtain an adjusted classification model, wherein the labeling classification category is the classification category to which the input text labeled in advance belongs, and the adjusted classification model is used for obtaining the adjusted classification category to which the input text belongs according to the input text to generate adjusted output content.

The labeling classification category to which the input text belongs is a category that a technician labels in advance and is used for indicating the classification category to which the input text belongs.

And calculating a second loss function value according to the difference between the second classification category and the labeling classification category, and adjusting parameters of the classification model with the aim of minimizing the second loss function value to obtain an adjusted classification model. The second loss function may be any one of a mean square error loss function, a cross entropy loss function, a log likelihood loss function, an exponential loss function, and the like, which is not limited in the present application.

In some embodiments, the first loss function value may be calculated according to a difference between the first classification category and the labeling classification category, and the first loss function value and the second loss function value may be weighted and summed to obtain a comprehensive loss function value, and parameters of the classification model may be adjusted with the objective of minimizing the comprehensive loss function value to obtain an adjusted classification model. The first loss function and the second loss function may be any one of a mean square error loss function, a cross entropy loss function, a log likelihood loss function, an exponential loss function, and the like, which is not limited in the present application.

In some embodiments, a third classification category to which the input text belongs is determined from the plurality of classification categories by the adjusted classification model. And determining the prompting words corresponding to the third classification category according to the corresponding relation between the classification categories and the prompting words. And generating the adjusted output content corresponding to the input text through the generation model according to the input text and the prompt words of the third classification category.

Fig. 7 shows a content optimization process based on a classification model, wherein a first classification category is obtained according to an input text through a pre-trained classification model, and a first prompt word corresponding to the first classification category is determined according to the corresponding relation between a plurality of classification categories and the prompt word. And adding the first prompt word into the input data of the generation model, so that the generation model generates output content corresponding to the input text. And the output content flows back to the input stage of the classification model, and parameters of the classification model are adjusted based on the input text and the output content, so that an adjusted classification model is obtained. The adjusted classification model is used to reacquire the adjusted classification category to generate adjusted output content. By an automated content optimization method, the dependence on human resources can be significantly reduced, which is particularly important for processing large-scale data and content generation, and the efficiency of content generation can be improved and the cost of content generation can be reduced.

By retraining the classification model through the steps, the classification capability of the classification model can be further improved, the accuracy of the classification model in judging the classification is improved, and the probability of misclassification is reduced. Therefore, the adjusted classification model is applied to the content generation method, the accuracy and the safety of content generation can be improved, and the probability of generating output content containing illegal content due to misclassification is reduced.

The following are examples of the apparatus of the present application that may be used to perform the method embodiments of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the embodiments of the method of the present application.

Referring to fig. 8, a block diagram of a content generating apparatus according to an embodiment of the present application is shown. The device has the function of realizing the content generation method, and the function can be realized by hardware or can be realized by executing corresponding software by hardware. The apparatus may be the computer device described above or may be provided in a computer device. As shown in fig. 8, the apparatus 800 may include: a text acquisition module 810, a category determination module 820, a prompt word determination module 830, and a content output module 840.

The text acquisition module 810 is configured to acquire an input text.

The category determining module 820 is configured to determine a first classification category to which the input text belongs from a plurality of classification categories, where the plurality of classification categories include at least one violation category and one compliance category, the violation category is a category to which the violation content included in the input text belongs, and the compliance category is that the violation content is not included in the input text.

The prompt word determining module 830 is configured to determine a prompt word corresponding to the first classification category, to obtain a first prompt word, where the first prompt word is used to indicate a content generation requirement corresponding to the input text.

And the content output module 840 is configured to generate output content corresponding to the input text according to the input text and the first prompt word through a generation model.

In some embodiments, the prompt word determining module 830 is configured to determine, according to the correspondence between the plurality of classification categories and the prompt word, the prompt word corresponding to the first classification category, and obtain the first prompt word.

In some embodiments, the first classification category is determined by a classification model, the apparatus 800 further comprising a model adjustment module for determining a second classification category from the plurality of classification categories based on the input text and the output content by the classification model; and adjusting parameters of the classification model according to the difference between the second classification category and the labeling classification category to which the input text belongs to obtain an adjusted classification model, wherein the labeling classification category is a classification category to which the input text belongs to be labeled in advance, and the adjusted classification model is used for obtaining the adjusted classification category to which the input text belongs according to the input text to generate adjusted output content.

In some embodiments, the model adjustment module is configured to perform feature extraction on the input text and the output content through a feature extractor in the classification model, so as to obtain feature information of the input text and feature information of the output content; obtaining fusion characteristics according to the characteristic information of the input text and the characteristic information of the output content; and judging the classification category to which the input text belongs by a feature identifier in the classification model according to the fusion feature to obtain the second classification category.

In some embodiments, the model adjustment module is configured to splice the feature information of the input text and the feature information of the output content to obtain the fusion feature; or determining the difference characteristic of the characteristic information of the input text relative to the characteristic information of the output content; and obtaining the fusion characteristic according to the characteristic information of the input text, the characteristic information of the output content and the difference characteristic.

In some embodiments, the model adjustment module is configured to perform feature extraction on the first prompt word through the feature extractor to obtain feature information of the first prompt word; splicing the characteristic information of the input text and the characteristic information of the first prompt word to obtain splicing characteristics; and determining the distance between the splicing characteristic and the characteristic information of the output content to obtain the difference characteristic.

In some embodiments, the first classification category is determined by a classification model comprising a feature extractor and a feature arbiter; the category determining module 820 is configured to perform feature extraction on the input text by using the feature extractor, so as to obtain feature information of the input text; and judging the classification category to which the input text belongs by the feature identifier according to the feature information of the input text, and obtaining the first classification category.

In some embodiments, the category determining module 820 is configured to detect the input text according to the feature information of the input text by using the feature identifier, determine a category to which the offending content belongs if the offending content is included in the input text, and determine the offending category to which the offending content belongs as the first classification category; and if the input text does not contain the illegal content, determining the compliance category as the first classification category.

In some embodiments, the content output module 840 is configured to generate, by using a text generation model, output content in a text form corresponding to the input text according to the input text and the first prompt word; or generating output content in an image form corresponding to the input text according to the input text and the first prompt word through an image generation model.

It should be noted that, in the apparatus provided in the foregoing embodiment, when implementing the functions thereof, only the division of the foregoing functional modules is used as an example, in practical application, the foregoing functional allocation may be implemented by different functional modules, that is, the content structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the apparatus and the method embodiments provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the apparatus and the method embodiments are detailed in the method embodiments and are not repeated herein.

Referring to FIG. 9, a block diagram of a computer device 900 according to one embodiment of the application is shown. The computer device 900 may be any electronic device having data computing, processing, and storage capabilities. The computer apparatus 900 may be used to implement the content generation method provided in the above-described embodiments.

In general, the computer device 900 includes: a processor 901 and a memory 902.

Processor 901 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 901 may be implemented in at least one hardware form of DSP (DIGITAL SIGNAL Processing), FPGA (Field Programmable GATE ARRAY ), PLA (Programmable Logic Array, programmable logic array). Processor 901 may also include a main processor, which is a processor for processing data in an awake state, also referred to as a CPU (Central Processing Unit ), and a coprocessor; a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 901 may integrate a GPU (Graphics Processing Unit, image processor) for rendering and drawing of content required to be displayed by the display screen. In some embodiments, the processor 901 may also include an AI processor for processing computing operations related to machine learning.

The memory 902 may include one or more computer-readable storage media, which may be non-transitory. The memory 902 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 902 is used to store a computer program configured to be executed by one or more processors to implement the content generation methods described above.

Those skilled in the art will appreciate that the architecture shown in fig. 9 is not limiting of the computer device 900, and may include more or fewer components than shown, or may combine certain components, or employ a different arrangement of components.

In an exemplary embodiment, a computer readable storage medium is also provided, in which a computer program is stored, which computer program, when being executed by a processor of a computer device, implements the above-described content generation method. Alternatively, the above-mentioned computer-readable storage medium may be a ROM (Read-Only Memory), a RAM (Random Access Memory ), a CD-ROM (Compact Disc Read-Only Memory), a magnetic tape, a floppy disk, an optical data storage device, or the like.

In an exemplary embodiment, a computer program product is also provided, the computer program product comprising a computer program stored in a computer readable storage medium. A processor of a computer device reads the computer program from a computer-readable storage medium, and the processor executes the computer program so that the computer device performs the content generating method described above.

It should be noted that, before and during the process of collecting the relevant data of the user, the present application may display a prompt interface, a popup window or output voice prompt information, where the prompt interface, popup window or voice prompt information is used to prompt the user to collect the relevant data currently, so that the present application only starts to execute the relevant step of obtaining the relevant data of the user after obtaining the confirmation operation of the user to the prompt interface or popup window, otherwise (i.e. when the confirmation operation of the user to the prompt interface or popup window is not obtained), the relevant step of obtaining the relevant data of the user is finished, i.e. the relevant data of the user is not obtained. In other words, all user data collected by the method are processed strictly according to the requirements of relevant national laws and regulations, informed consent or independent consent of the personal information body is collected under the condition that the user agrees and authorizes, and the subsequent data use and processing actions are carried out within the scope of laws and regulations and the authorization of the personal information body, and the collection, use and processing of relevant user data are required to comply with relevant laws and regulations and standards of relevant countries and regions.

It should be understood that references herein to "a plurality" are to two or more. "and/or", describes an association relationship of an association object, and indicates that there may be three relationships, for example, a and/or B, and may indicate: a exists alone, A and B exist together, and B exists alone. The character "/" generally indicates that the context-dependent object is an "or" relationship. In addition, the step numbers described herein are merely exemplary of one possible execution sequence among steps, and in some other embodiments, the steps may be executed out of the order of numbers, such as two differently numbered steps being executed simultaneously, or two differently numbered steps being executed in an order opposite to that shown, which is not limiting.

The foregoing description of the exemplary embodiments of the application is not intended to limit the application to the particular embodiments disclosed, but on the contrary, the intention is to cover all modifications, equivalents, and alternatives falling within the spirit and scope of the application.

Claims

1. A content generation method, the method comprising:

acquiring an input text;

Determining a first classification category to which the input text belongs from a plurality of classification categories through a classification model, wherein the plurality of classification categories comprise at least one violation category and one compliance category, the violation category refers to a category to which the violation content contained in the input text belongs, and the compliance category refers to a category to which the violation content is not contained in the input text;

determining a prompt word corresponding to the first classification category to obtain a first prompt word, wherein the first prompt word is used for indicating a content generation requirement corresponding to the input text;

generating output content corresponding to the input text according to the input text and the first prompt word through a generation model, wherein the content generation requirement means that the output content does not contain illegal content corresponding to the first classification category;

Respectively extracting the characteristics of the input text and the output content through a characteristic extractor in the classification model to obtain the characteristic information of the input text and the characteristic information of the output content; extracting the characteristics of the first prompt word through the characteristic extractor to obtain the characteristic information of the first prompt word;

splicing the characteristic information of the input text and the characteristic information of the first prompt word to obtain splicing characteristics; determining the distance between the splicing characteristics and the characteristic information of the output content to obtain difference characteristics; obtaining fusion characteristics according to the characteristic information of the input text, the characteristic information of the output content and the difference characteristics;

judging the classification category to which the input text belongs according to the fusion characteristic by a characteristic judging device in the classification model to obtain a second classification category;

And adjusting parameters of the classification model according to the difference between the second classification category and the labeling classification category to which the input text belongs to obtain an adjusted classification model, wherein the labeling classification category is a classification category to which the input text belongs to be labeled in advance, and the adjusted classification model is used for obtaining the adjusted classification category to which the input text belongs according to the input text to generate adjusted output content.

2. The method of claim 1, wherein the determining the alert word corresponding to the first classification category to obtain the first alert word includes:

And determining the prompt word corresponding to the first classification category according to the corresponding relation between the classification categories and the prompt word, so as to obtain the first prompt word.

3. The method of claim 1, wherein the determining a first classification category to which the input text belongs from a plurality of classification categories comprises:

extracting the characteristics of the input text through the characteristic extractor to obtain the characteristic information of the input text;

And judging the classification category to which the input text belongs by the feature identifier according to the feature information of the input text, and obtaining the first classification category.

4. A method according to claim 3, wherein the discriminating, by the feature discriminator, the classification category to which the input text belongs according to the feature information of the input text, to obtain the first classification category includes:

Detecting the input text through the feature discriminator according to the feature information of the input text, discriminating the category to which the illegal content belongs if the input text contains the illegal content, and determining the illegal category to which the illegal content belongs as the first classification category;

And if the input text does not contain the illegal content, determining the compliance category as the first classification category.

5. The method according to claim 1, wherein the generating, by the generating model, output content corresponding to the input text according to the input text and the first prompt word includes:

Generating output content in a text form corresponding to the input text according to the input text and the first prompt word through a text generation model;

Or alternatively

And generating output contents in an image form corresponding to the input text according to the input text and the first prompt word through an image generation model.

6. A content generation apparatus, the apparatus comprising:

The text acquisition module is used for acquiring an input text;

the classification determining module is used for determining a first classification class to which the input text belongs from a plurality of classification classes through a classification model, wherein the plurality of classification classes comprise at least one violation class and one compliance class, the violation class is a class to which the violation content contained in the input text belongs, and the compliance class is a class to which the violation content is not contained in the input text;

the prompt word determining module is used for determining the prompt word corresponding to the first classification category to obtain a first prompt word, wherein the first prompt word is used for indicating a content generation requirement corresponding to the input text;

The content output module is used for generating output content corresponding to the input text according to the input text and the first prompt word through a generation model, and the content generation requirement means that the output content does not contain illegal content corresponding to the first classification category;

The model adjustment module is used for respectively carrying out feature extraction on the input text and the output content through a feature extractor in the classification model to obtain feature information of the input text and feature information of the output content; extracting the characteristics of the first prompt word through the characteristic extractor to obtain the characteristic information of the first prompt word; splicing the characteristic information of the input text and the characteristic information of the first prompt word to obtain splicing characteristics; determining the distance between the splicing characteristics and the characteristic information of the output content to obtain difference characteristics; obtaining fusion characteristics according to the characteristic information of the input text, the characteristic information of the output content and the difference characteristics; judging the classification category to which the input text belongs according to the fusion characteristic by a characteristic judging device in the classification model to obtain a second classification category; and adjusting parameters of the classification model according to the difference between the second classification category and the labeling classification category to which the input text belongs to obtain an adjusted classification model, wherein the labeling classification category is a classification category to which the input text belongs to be labeled in advance, and the adjusted classification model is used for obtaining the adjusted classification category to which the input text belongs according to the input text to generate adjusted output content.

7. A computer device comprising a processor and a memory, the memory having stored therein a computer program that is loaded and executed by the processor to implement the content generation method of any of claims 1 to 5.

8. A computer-readable storage medium, in which a computer program is stored, the computer program being loaded and executed by a processor to implement the content generation method of any one of claims 1 to 5.

9. A computer program product, characterized in that the computer program product comprises a computer program that is loaded and executed by a processor to implement the content generation method of any of claims 1 to 5.