CN117171325A - Task processing method and server - Google Patents

Task processing method and server Download PDF

Info

Publication number
CN117171325A
CN117171325A CN202311205678.9A CN202311205678A CN117171325A CN 117171325 A CN117171325 A CN 117171325A CN 202311205678 A CN202311205678 A CN 202311205678A CN 117171325 A CN117171325 A CN 117171325A
Authority
CN
China
Prior art keywords
task
model
training
domain
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311205678.9A
Other languages
Chinese (zh)
Inventor
黄申
马仕镕
王潇斌
蒋勇
谢朋峻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hangzhou Alibaba Cloud Feitian Information Technology Co ltd
Original Assignee
Hangzhou Alibaba Cloud Feitian Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hangzhou Alibaba Cloud Feitian Information Technology Co ltd filed Critical Hangzhou Alibaba Cloud Feitian Information Technology Co ltd
Priority to CN202311205678.9A priority Critical patent/CN117171325A/en
Publication of CN117171325A publication Critical patent/CN117171325A/en
Pending legal-status Critical Current

Links

Landscapes

  • Machine Translation (AREA)

Abstract

The application provides a task processing method and a server. The method of the application uses the mixed pre-training data of the vertical field and the general field to continuously pre-train the pre-training model to obtain the pre-trained field basic model, so that the model is familiar with the characteristics of text format and data distribution of the vertical field on the basis of keeping the original strong semantic understanding and representing capability of the pre-training model as far as possible, and more knowledge of the vertical field is injected into the model, thereby enhancing the text understanding and generating capability of the model in the vertical field; and then, a data set in the vertical field is used for carrying out fine tuning training on the field basic model to obtain a field model suitable for the vertical field, a field model universal for various natural language processing tasks in the vertical field can be obtained, and the text understanding and generating capacity of the model in the vertical field is improved, so that the generating quality of task processing results when the model is applied to various NLP tasks in the vertical field is improved.

Description

Task processing method and server
Technical Field
The present application relates to computer technologies, and in particular, to a task processing method and a server.
Background
In recent years, natural language processing (Natural Language Processing, abbreviated as NLP) technology has rapidly developed. The deep learning model such as a large language model (Large Language Model, LLM for short) for NLP unifies different natural language processing tasks into a model form of text generation, and achieves better effects on a large number of natural language processing tasks.
However, the mainstream deep learning model (such as LLM) for NLP is poor in performance in various tasks in vertical fields (such as e-commerce, medical, educational, scientific, financial, etc.), and the quality of the generated processing result is low.
Disclosure of Invention
The application provides a task processing method and a server, which are used for solving the problems that a large language model is poor in performance on various tasks in the vertical field and the quality of generated processing results is low.
In a first aspect, the present application provides a task processing method, including: continuously pre-training the pre-training model by using mixed pre-training data of the vertical field and the universal field to obtain a pre-trained field basic model; and performing fine tuning training on the domain basic model by using the data set of the vertical domain to obtain a domain model suitable for the vertical domain, wherein the domain model is used for executing natural language processing tasks of the vertical domain and generating corresponding task processing results.
In a second aspect, the present application provides a task processing method, including: using mixed pre-training data of the e-commerce field and the general field to continuously pre-train the pre-training large model to obtain a basic large model of the pre-trained e-commerce field; and performing fine tuning training on the basic large model in the E-commerce field by using the data set in the E-commerce field to obtain the large model in the E-commerce field, wherein the large model in the E-commerce field is used for executing natural language processing tasks in the E-commerce field and generating corresponding task processing results.
In a third aspect, the present application provides a task processing method, applied to a server, including: receiving a call request for a domain model sent by a terminal side device, wherein the call request comprises a task instruction, and the task instruction is generated according to task prompt format information and input data of a natural language processing task to be executed; inputting the task instruction into the domain model, executing task processing based on the task instruction through the domain model, and generating a task processing result, wherein the domain model is obtained by using mixed pre-training data of a vertical domain and a general domain, continuously pre-training the pre-training model to obtain a pre-trained domain basic model, and performing fine-tuning training on the domain basic model by using a data set of the vertical domain; and returning the task processing result to the end-side equipment.
In a fourth aspect, the present application provides a server comprising: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to cause the server to perform the method of any of the above aspects.
According to the task processing method and the server, the pre-training model is continuously pre-trained by using the mixed pre-training data of the vertical field and the general field, so that a pre-trained field basic model is obtained, the model is enabled to be familiar with the characteristics of text format and data distribution of the vertical field on the basis of keeping the original strong semantic understanding and representing capability of the pre-training model as much as possible, and more knowledge of the vertical field is injected into the model, so that the text understanding and generating capability of the model in the vertical field is enhanced; and then, a data set in the vertical field is used for carrying out fine tuning training on the field basic model to obtain a field model suitable for the vertical field, a field model universal for various natural language processing tasks in the vertical field can be obtained, and the text understanding and generating capacity of the model in the vertical field is improved, so that the generating quality of task processing results when the model is applied to various NLP tasks in the vertical field is improved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.
FIG. 1 is a schematic diagram of an exemplary system architecture to which the present application is applicable;
FIG. 2 is a schematic diagram of another example system architecture to which the present application is applicable;
FIG. 3 is a flow chart of a task processing method according to an exemplary embodiment of the present application;
FIG. 4 is a flow chart for constructing pre-training data for a vertical domain provided by an exemplary embodiment of the present application;
FIG. 5 is a flowchart for constructing a vertical domain multi-tasking instruction data set provided by an exemplary embodiment of the present application;
FIG. 6 is a schematic diagram of a fine tuning training framework for multi-tasking instruction data provided by an exemplary embodiment of the present application;
FIG. 7 is a block diagram of a domain model for acquiring vertical domains in accordance with an exemplary embodiment of the present application;
FIG. 8 is a flowchart of a task processing method in the e-commerce domain according to an exemplary embodiment of the present application;
FIG. 9 is a flowchart of a task process implemented based on a domain model provided by an exemplary embodiment of the present application;
FIG. 10 is a flowchart of a task process implemented based on a domain model provided by another exemplary embodiment of the present application;
FIG. 11 is a schematic diagram of a flow framework of a commodity comparison task according to an exemplary embodiment of the present application;
FIG. 12 is a flow chart of a commodity comparison method according to an exemplary embodiment of the present application;
FIG. 13 is an exemplary diagram of an interface for displaying product comparison results according to an exemplary embodiment of the present application;
fig. 14 is a schematic structural diagram of a server according to an exemplary embodiment of the present application.
Specific embodiments of the present application have been shown by way of the above drawings and will be described in more detail below. The drawings and the written description are not intended to limit the scope of the inventive concepts in any way, but rather to illustrate the inventive concepts to those skilled in the art by reference to the specific embodiments.
Detailed Description
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.
It should be noted that, the user information (including but not limited to user equipment information, user attribute information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with related laws and regulations and standards, and provide corresponding operation entries for the user to select authorization or rejection.
First, the terms involved in the present application will be explained:
large language model (Large Language Model, LLM for short): natural language text, also known as large-scale language models, can be understood and generated using a deep neural network model with a large number of text-trained parameters.
Pre-training a large model: and (3) pre-training the large models such as a large-scale language model (LLM) and the like to obtain a pre-training model.
The field continues to pretrain: based on the pretraining model in the general field, the pretraining task is used for continuously pretraining on the unlabeled corpus in the specific field.
Vertical field: the term of internet industry refers to the division of fields for providing specific services to a defined group, including the fields of e-commerce, entertainment, medical, environmental protection, education, sports, etc. The vertical domain is a small domain divided vertically under one large domain.
General field: the general field in the application is a concept opposite to the vertical field, the general field comprises multiple fields, multiple vertical fields can be covered, and the general field can be understood as a more general or common field with a large amount of existing corpus data. The data of the general field may cover data of one or more vertical fields.
Multitasking instruction trimming: based on the fine tuning training of the multi-task instruction data, the method for fine tuning the pre-training model is characterized in that the training model can accurately respond to task instructions given by human beings, so that tasks driven by various task instructions can be better solved.
Named entity recognition (Named Entity Recognition, simply NER): named entities having a particular meaning are identified from the text.
Relationship extraction (Relation Extraction, RE for short): relationships between text segments are identified from the text.
Thinking Chain (Chain of thoughts, simply CoT): the method is a method for gradually deducing the final result by giving out natural language description of the reasoning steps when complex reasoning is carried out. For example, commodity contrast tasks: and (5) comparing the two commodities and giving the recommendation meeting the user requirement according to the related information of the two commodities. One chain of thought to solve this task may be: based on the information of the two commodities, the characteristics, the same points and different points of the two commodities are respectively given, the advantages of the two commodities are described, and finally recommended commodities are given according to the requirements of users. The model is subjected to task reasoning step by step through a thinking chain method, and an important reasoning process is provided while a final result is provided.
Visual question-answering task: from the input image and the question, an answer to the question is determined from visual information of the input image.
Image description task: descriptive text of the input image is generated.
Visual implication task: the semantic relativity of the input image and the text, namely implication, neutrality or contradiction, is predicted.
Refer to the expression and understanding task: and positioning an image area corresponding to the input information in the input image according to the input information.
Image generation tasks: an image is generated based on the entered descriptive text.
Text-based emotion classification tasks: and predicting emotion classification information of the input information.
Text summarization task: summary information of the input information is generated.
Multimodal tasks: the input/output data refers to downstream tasks of various modal data such as images, texts and the like, such as a visual question-answering task, an image description task, a visual implication task, a presentation and understanding task, an image generation task and the like.
Multimodal pre-training model: the method is characterized in that the input and output data relates to a pre-training model of various modal data such as images, texts and the like, and the pre-training model can be applied to multi-modal task processing after fine-tuning training.
Large models refer to deep-learning models with large-scale model parameters, typically containing hundreds of millions, or even hundreds of billions of model parameters. The large Model can also be called as a Foundation Model/base Model (FM), the large Model is Pre-trained through large-scale unlabeled corpus, a Pre-trained Model with more than one hundred million parameters is produced, the Model can adapt to a wide downstream task, and the Model has better generalization capability, such as a Large Language Model (LLM), a Multi-mode Pre-training Model (Multi-mode Pre-training Model) and the like.
The large model can be widely applied to the fields of natural language processing, computer vision and the like, and particularly can be applied to the tasks of the computer vision fields such as vision question and answer (Visual Question Answering, VQA for short), image description (IC for short), image generation and the like, and the tasks of the natural language processing fields such as emotion classification based on texts, text abstract generation, machine translation and the like, and the main application scenes of the large model comprise digital assistants, intelligent robots, searching, online education, office software, electronic commerce, intelligent design and the like.
The pre-training model (such as a pre-training language model, a pre-training large language model and the like) commonly used in the natural language processing field has strong natural language understanding capability, can execute various natural language processing tasks such as information extraction (such as Named Entity Recognition (NER), relation Extraction (RE) and the like), text classification, text generation and the like based on the generation capability of the pre-training model, and has wide application in various scenes such as electronic commerce, medical treatment, intelligent transportation, online education, digital assistant and the like. For example, intelligent questions and answers of related information of commodities, commodity comparison, commodity search and the like are realized in the e-commerce field, intelligent customer service of related knowledge of education is realized in the online education field, text subject content extraction and the like.
However, the multitasking general pre-training model in the natural language processing field is obtained by pre-training based on the general field pre-training data, and the model obtained by performing the fine-tuning training by using a small amount of fine-tuning data still has unsatisfactory performance on various tasks in the vertical field (such as the fields of e-commerce, medical treatment, education, science and technology and the like) because the scale of the labeling data used for the fine-tuning training is small, and the quality of the generated processing result is low.
Taking the e-commerce field as an example, due to the huge commercial value in the e-commerce field, many technicians study how to solve many problems in the e-commerce field by using natural language processing technology, such as commodity information extraction, user inquiry understanding, commodity content generation, intelligent customer service dialogue and the like. However, the generic large language model LLM does not solve various NLP tasks in the e-commerce domain well, because the text in the e-commerce domain has the following characteristics compared to the text in the generic domain:
1. many texts in the e-commerce field are not coherent and smooth sentences, but have unique structures. For example, a commodity title is typically a continuous concatenation of important entities or concepts, while a commodity's attribute list is typically a semi-structured Key-Value pair (Key-Value) list, not a complete sentence. This creates a great difficulty for generic LLM to understand e-commerce text.
2. There are a large number of novel entities and concepts in the e-commerce field, the distribution of which varies greatly from the general text and which are updated rapidly with changes in product and popularity trends. For example, the e-commerce field has many unique brands and merchant names, and the meaning behind these entities and the literal meaning may not be identical.
Based on the above characteristic features of the text in the e-commerce field, the quality of the task processing result generated when the training is applied to the natural language processing task in the e-commerce field is still lower after the training is performed by using a small amount of fine tuning data by using the pre-training model obtained by training the pre-training data in the general field.
Aiming at the technical problems, the application provides a task processing method, which continuously pretrains a pretrained model by using mixed pretrained data of a vertical field and a general field to obtain a pretrained field basic model, and the model is familiar with the characteristics of text format and data distribution of the vertical field on the basis of keeping the original strong semantic understanding and expression capability of the pretrained model as far as possible, and the knowledge of more vertical fields is injected into the model, so that the text understanding and generating capability of the model in the vertical field is enhanced; and then, a data set in the vertical field is used for carrying out fine tuning training on the field basic model to obtain a field model suitable for the vertical field, a field model universal for various natural language processing tasks in the vertical field can be obtained, and the text understanding and generating capacity of the model in the vertical field is improved, so that the generating quality of task processing results when the model is applied to various NLP tasks in the vertical field is improved. The pre-training model can be a model applied to natural language processing, such as various pre-training language models, a pre-training large language model and the like.
In the present application, in order to distinguish from the pre-training data, the data set used for the fine-tuning training is referred to as a "fine-tuning data set". In general, the data set used for pre-training is data that does not contain labeling information, and the data used for fine-tuning training is data that contains labeling information. In the application, when the domain basic model is subjected to fine tuning training, a fine tuning data set of the vertical domain is used.
FIG. 1 is a schematic diagram of an exemplary system architecture to which the present application is applicable. As shown in fig. 1, the system architecture includes a server and an end-side device. The server and the end side equipment are provided with a communication link capable of communicating, so that communication connection between the server and the end side equipment can be realized.
The server is a device with computing capability deployed in the cloud or locally, such as a cloud cluster. The server stores pre-training models, pre-built hybrid pre-training data of the vertical domain and the general domain, and fine-tuning data sets of the vertical domain. The server is in charge of continuously pre-training the pre-training model by using mixed pre-training data of the vertical field and the universal field to obtain a pre-trained field basic model; and then, using a fine tuning data set in the vertical field to carry out fine tuning training on the field basic model, so as to obtain the field model applicable to the vertical field. The domain model can be applied to various natural language processing tasks in the vertical domain, including but not limited to tasks such as information extraction (e.g. named entity recognition NER, relation extraction RE, etc.), text classification, text generation, etc. Different types of tasks use different task prompt format information, and the task prompt format information is used for indicating task requirements, input data and output results to be executed by the domain model.
The terminal device may be an electronic device running a downstream application system, and specifically may be a hardware device with a network communication function, an operation function, and an information display function, which includes, but is not limited to, a smart phone, a tablet computer, a desktop computer, a local server, a cloud server, and the like. The end-side device requires natural language processing capabilities using domain models when running downstream applications. For example, the downstream application system operated by the end-side device may be a device that implements functions such as intelligent question-answering, information extraction, text classification, and text summarization in a vertical domain, and when implementing at least one function of the downstream application system, natural language processing capability of a domain model in the vertical domain is required.
Based on the system architecture shown in fig. 1, when a natural language processing task in a vertical field needs to be executed, an end side device acquires input data and task prompt format information of the natural language processing task, generates a task instruction according to the task prompt format information and the input data, and sends a call request for a field model to a server, wherein the call request contains the task instruction. The server receives the call request, acquires a task instruction to be executed, inputs the task instruction into a field model in the vertical field, executes task processing based on the task instruction through the field model, generates a task processing result, and returns the task processing result to the terminal side device. And the terminal side equipment receives the task processing result returned by the server, and continuously executes the processing logic of the downstream application system according to the task processing result to realize the function of the downstream application system.
FIG. 2 is a schematic diagram of another example system architecture to which the present application is applicable. As shown in fig. 2, the system architecture includes a server and a model service. The communication link is arranged between the server and the model service, so that the communication connection between the server and the model service can be realized.
Wherein the model service is a service that can provide a pre-trained model. The model service may provide the server with download information of the pre-trained model. The server downloads the pre-trained model from the model service to the local according to the download information. Alternatively, the server may send a pre-training model acquisition request to the model service, which sends the pre-training model to the server in response to the training model acquisition request. In addition, the server may also obtain the pre-trained model from the model service through other interaction modes, which are not specifically limited herein.
The servers may be computing-capable devices deployed at the cloud or locally by various institutions or system platforms, such as cloud clusters, local servers, and the like. The server model service obtains a pre-training model and stores a pre-built hybrid pre-training data of the vertical domain and the general domain, and a fine-tuning data set of the vertical domain. The server is in charge of continuously pre-training the pre-training model by using mixed pre-training data of the vertical field and the universal field to obtain a pre-trained field basic model; and then, using a fine tuning data set in the vertical field to carry out fine tuning training on the field basic model, so as to obtain the field model applicable to the vertical field. The domain model can be applied to various natural language processing tasks in the vertical domain, including but not limited to tasks such as information extraction (e.g. named entity recognition NER, relation extraction RE, etc.), text classification, text generation, etc. Different types of tasks use different task prompt format information, and the task prompt format information is used for indicating task requirements, input data and output results to be executed by the domain model.
Based on the architecture shown in fig. 2, the server also runs a downstream application system that uses the domain model of the vertical domain. When a server runs a downstream application system and needs to execute a natural language processing task in the vertical field, acquiring input data and task prompt format information of the natural language processing task, generating a task instruction according to the task prompt format information and the input data, inputting the task instruction into a field model in the vertical field, executing task processing based on the task instruction through the field model, generating a task processing result and returning the task processing result to the downstream application system. And the server continuously executes subsequent processing logic of the downstream application system based on the task processing result to realize the functions of the downstream application system.
The following describes the technical scheme of the present application and how the technical scheme of the present application solves the above technical problems in detail with specific embodiments. The following embodiments may be combined with each other, and the same or similar concepts or processes may not be described in detail in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.
Fig. 3 is a flowchart of a task processing method according to an exemplary embodiment of the present application. The execution body of the embodiment is a server in the system architecture shown in fig. 1 or fig. 2. The method of the embodiment aims at continuously pre-training and fine-tuning the pre-training model according to the specific vertical field to which the user belongs to obtain the field model applicable to the specific vertical field. As shown in fig. 3, the method specifically comprises the following steps:
And S31, continuously pre-training the pre-training model by using mixed pre-training data of the vertical field and the universal field to obtain a pre-trained field basic model.
The pre-training model can be a model applied to natural language processing, such as various pre-training language models, a pre-training large language model and the like.
The vertical field in this embodiment refers to a specific field to which the pre-training model needs to be applied according to the actual application scene requirement, for example, an e-commerce field, a medical field, a traffic field, a sports field, and the like. The general domain is a concept opposite to the vertical domain, and the general domain can be understood as a relatively general or common domain in which a large amount of existing corpus data exists, and the general domain generally includes a plurality of domains and can cover a plurality of vertical domains.
The vertical domain pre-training data refers to data for model pre-training constructed based on the vertical domain data. The pre-training data of the general field refers to data constructed for model pre-training based on the data of the general field.
The construction mode of the pre-training data in the general field is consistent with the construction mode of the training data used in the pre-training stage of the pre-training model, and is not repeated here. In addition, the pre-training data in the general field can directly adopt training data used in a pre-training stage of the pre-training model.
The construction mode of the pre-training data in the vertical field is similar to that of the pre-training data in the general field, and the difference is that the used data sources are different, the pre-training data in the vertical field is constructed based on the data from the vertical field only, and the pre-training data in the general field is constructed based on the data in the general field, but the construction flow is consistent, and the construction mode of the pre-training data in the pre-training stage of the pre-training model is consistent, and is not repeated here.
The mixed pre-training data of the vertical field and the universal field is obtained by mixing the pre-training data of the vertical field with the pre-training data of the universal field. Specifically, the pre-training data of the vertical domain and the pre-training data of the general domain may be combined into one training set, which is used as a training set for continuing the pre-training of the pre-training model, and the training set contains the mixed pre-training data of the vertical domain and the general domain.
Optionally, for the training set obtained by combining the pre-training data in the vertical field and the universal field, the sequence of the training data in the training set can be further disturbed, so that the pre-training data in the vertical field and the pre-training data in the universal field are fully mixed, and the training effect of continuous pre-training in the field of the pre-training model can be improved.
In constructing the hybrid pre-training data for the vertical domain and the general domain, the duty ratio of the pre-training data for the vertical domain cannot be too small or too large. If the duty ratio of the pre-training data in the vertical field is too small, the knowledge of the model learning in the vertical field is insufficient, and the text understanding and generating capacity of the model in the vertical field is affected. If the duty ratio of the pre-training data in the vertical field is too large, the model cannot better maintain the original strong semantic understanding and representing capability of the pre-training model, and the text understanding and generating capability of the model in the vertical field is also affected. The ratio of the pre-training data in the vertical field is about 1/2, the vertical field and the general field are mixed in a ratio of about 1:1, the obtained mixed pre-training data continuously pre-trains the pre-training model, the model can be familiar with the characteristics of text format and data distribution in the vertical field on the basis of keeping the original strong semantic understanding and representing capability of the pre-training model as much as possible, and more knowledge in the vertical field is injected into the model, so that the text understanding and generating capability of the model in the vertical field is enhanced. In some embodiments, the training samples in the mixed pre-training data are unsupervised text, i.e., the training samples are sample text without labels.
In the step, the mixed pre-training data of the vertical field and the universal field is used for continuously pre-training the pre-training model to obtain a pre-trained field basic model, and the continuous pre-training process can adopt a training mode consistent with the pre-training process of the pre-training model. Of course, the pre-training model is continued in this step, and a training mode different from the pre-training of the pre-training model may also be adopted.
In this step, the mixed pre-training data of the vertical domain and the general domain may be used to perform any of the following pre-training tasks on the pre-training model: and a causal language modeling (Causal Language Model, CLM for short) task and a mask language modeling task to realize continuous pre-training of the pre-training model and obtain a pre-trained field basic model.
Wherein the causal language modeling tasks include a forward causal language modeling task and an inverse causal language modeling task. The hybrid pre-training data includes a plurality of training samples, each training sample including a plurality of words (also referred to as word segments, token). The forward causal language modeling task is a modeling task that predicts a subsequent word in the training sample from a preceding word in the training sample. The inverse causal language modeling task is a modeling task that predicts a preceding word in the training sample from a following word in the training sample. The masking language modeling task is a modeling task that predicts words at masking locations in the training samples from words at non-masking locations in the training samples.
Illustratively, in this step, a causal language modeling pre-training task (e.g., a forward causal language modeling pre-training task, or an inverse causal language modeling pre-training task) may be performed on the pre-training model using the hybrid pre-training data of the vertical domain and the universal domain, to enable continued pre-training of the pre-training model, resulting in a pre-trained domain base model.
Taking a forward causal language modeling pre-training task as an example, the server samples training samples from the mixed pre-training data, and performs word segmentation processing on the training samples to obtain each word in the training samples; determining word vectors corresponding to words in the training samples, carrying out coding processing on each word vector through a pre-training model to obtain a coded vector of each word vector, and further predicting any word in the training samples according to the coded vector of any word and the coded vector of the historical word to obtain a forward prediction result of the any word. That is, the forward prediction result of any word is one word located after and adjacent to the any word in the training sample. The history word is the word preceding the any word in the training sample. Illustratively, a first word X1 is predicted from a beginning symbol (BOS) in a training sample by a pre-training model, then a second word X2 is predicted from the BOS and the first word X1, then a third word X3 is predicted from the BOS, the first word X1 and the second word X2, and so on until a last word or ending symbol (End-Of-segment, EOS) is output from the BOS and each preceding word for identifying the End Of the training sample. Further, a forward pre-training loss function is determined from the forward prediction results, the forward pre-training loss function characterizing a loss function of the forward causal language-based modeling pre-training task.
In the step, the pre-training model is continuously pre-trained by using the mixed pre-training data of the vertical field and the universal field, so that the model is familiar with the characteristics of text format and data distribution of the vertical field on the basis of keeping the original strong semantic understanding and representing capability of the pre-training model as much as possible, and more knowledge of the vertical field is injected into the model, thereby enhancing the text understanding and generating capability of the model in the vertical field.
And S32, performing fine tuning training on the domain basic model by using the data set of the vertical domain to obtain a domain model suitable for the vertical domain, wherein the domain model is used for executing natural language processing tasks of the vertical domain and generating corresponding task processing results.
In this embodiment, in order to distinguish from the pre-training data, the data set used for the fine-tuning training is referred to as a "fine-tuning data set". In general, the data set used for pre-training is data that does not contain labeling information, and the data used for fine-tuning training is data that contains labeling information.
After the pre-training model is continuously pre-trained by using mixed pre-training data of the vertical field and the general field to obtain a pre-trained field basic model, the field basic model can be subjected to fine-tuning training by using a fine-tuning data set of the vertical field to obtain a field model suitable for the vertical field, and the finally obtained field model can be better suitable for natural language processing tasks of the vertical field by fine-tuning training of the field basic model, so that the performance of the model is further improved.
The method of the embodiment uses mixed pre-training data of the vertical field and the general field to continuously pre-train the pre-training model to obtain a pre-trained field basic model, so that the model is familiar with the characteristics of text formats and data distribution of the vertical field on the basis of keeping the original strong semantic understanding and representing capability of the pre-training model as much as possible, and more knowledge of the vertical field is injected into the model, thereby enhancing the text understanding and generating capability of the model in the vertical field; further, fine tuning training is performed on the domain basic model by using the fine tuning data set of the vertical domain, so that a domain model suitable for the vertical domain is obtained, the finally obtained domain model can be better suitable for natural language processing tasks in the vertical domain, the performance of the model is further improved, and therefore the generation quality of task processing results when the model is applied to various NLP tasks in the vertical domain is greatly improved.
In an alternative embodiment, the server pre-builds the hybrid pre-training data of the vertical domain and the general domain before continuing pre-training the pre-training model to obtain the pre-trained domain base model using the hybrid pre-training data of the vertical domain and the general domain in step S31. The method can be realized in the following way: and constructing pre-training data in the vertical field, and mixing the pre-training data in the vertical field with the pre-training data in the general neighborhood according to a preset mixing mode to obtain mixed pre-training data. The preset mixing mode is a pre-configured mixing mode used for mixing the pre-training data in the vertical field and the universal field. In this embodiment, a plurality of available mixing modes and specific mixing rules corresponding to the mixing modes are preconfigured. In practical application, a user can select one of the mixing modes as a preset mixing mode according to the application requirement of the user.
When the user triggers the selection of the mixing mode, the server displays the configured mixing mode through the mixing mode configuration interface, and displays the mixing rules corresponding to the various mixing modes. The user can browse the configured mixing modes and the corresponding mixing rules through the mixing mode configuration interface, and select one mixing mode suitable for application requirements of the user as a preset mixing mode used in a model training process of obtaining a field model in the vertical field. The user may select one of the mixing modes and trigger the selected operation. In response to a selected operation on any of the mixing schemes, the server takes the selected mixing scheme as a preset mixing scheme currently in use. The selection operation may be that the user clicks a "select" control in the interface, or that the user triggers a selection operation of a hybrid manner displayed by clicking the interface, or may be implemented in other manners with a similar function of selecting an option, which is not specifically limited herein.
In this embodiment, the user is supported to edit and update the mixing rule corresponding to the configured mixing mode. Specifically, the server displays the configured mixing modes through the mixing mode configuration interface, and displays mixing rules corresponding to the various mixing modes. The user may edit the displayed mixing rules corresponding to the one or more mixing manners in the mixing manner configuration interface to modify the specific mixing rules corresponding to the mixing manners.
And responding to the editing operation of the mixing rule corresponding to any mixing mode, and obtaining a new mixing rule of the mixing mode after editing by the server and updating the mixing rule corresponding to the mixing mode. Further, the user may select a mixing manner after updating the mixing rule as a preset mixing manner used in a model training process for obtaining a domain model of the vertical domain.
Alternatively, a typical mixing regime is: and mixing the pre-training data in the vertical field with the pre-training data in the general neighborhood according to a preset mixing proportion to obtain mixed pre-training data. The mixing rule corresponding to the mixing mode is configured with a preset mixing proportion. The preset mixing proportion is a mixing proportion of pre-training data in a pre-configured vertical field and a general field, and specific proportion values can be configured and adjusted according to requirements of actual application scenes, and are not particularly limited herein. The preset mixing ratio can obtain a better training effect when the mixing ratio is 1:1 or is close to 1:1, and specific values can be configured and adjusted according to the needs of actual application scenes, and the specific limits are not limited herein. For example, the preset mixing ratio may be configured to be 1:1, 2:3, 2:1, etc. In addition, the server sets a default value of the preset mixing proportion, and technicians can adjust and update the preset mixing proportion of the current configuration according to the requirements of actual application scenes. The default value of the preset mixing ratio may be 1:1, or may be other values. In addition, based on different mixing ratios, a plurality of mixing modes according to the mixing ratio can be set for users to select the mixing ratio suitable for the application requirements of the users.
In practical applications, the vertical field may comprise a plurality of sub-fields. For example, the vertical field may be an e-commerce field, including a subdivision field such as merchandise search, merchandise comparison, picture search, short video presentation, and the like. The vertical domain pre-training data comprises a plurality of sub-domain pre-training data. Another typical mixing regime is: and mixing the pre-training data of the application domain corresponding to the subdivision domain with the pre-training data of the general domain according to the application domain of the domain model to obtain mixed pre-training data. The mixed mode can strengthen the data characteristics of the subdivision field, so that the field model obtained through training can learn the data characteristics of the subdivision field better and is better suitable for the specific application field.
If the user can determine that the specific application field of the field model in the e-commerce field is the subdivision field of commodity comparison, then in the pre-training stage of the field model, mixed pre-training data of the subdivision field of commodity comparison and pre-training data of the general field can be used, so that the model learns data characteristics of the subdivision field of commodity comparison, and performance and generation quality of the obtained field model applied to a commodity comparison task are improved.
In an alternative embodiment, the pre-training data of the vertical domain may be constructed specifically by:
step S1, collecting various types of sample data in the vertical field.
In this step, a large amount of sample data in the vertical field is collected according to the vertical field to which the model is applied. The modality of the sample data input to the model may be different depending on the task performed by the domain model. For multi-modal tasks, the model input data typically includes samples of a variety of different modalities, including but not limited to text, pictures, video, code. In this step, sample data of one or more modes in the vertical field may be collected according to the mode of the input data of the task executed by the model. Wherein, different types of sample data refer to source types of sample data, such as sample data from different data platforms, which correspond to different types.
And S2, performing data cleaning on the sample data of various types to obtain input samples of various types in the vertical field.
In the step, data cleaning is performed on various types of sample data collected in the vertical field to obtain sample data with higher quality as an input sample, so that the quality of the input sample is ensured. Specifically, the data cleaning is performed on multiple types of sample data, and at least the filtering and the de-duplication are included on the sample data. In addition, the quality of the sample data can be improved by other commonly used data cleaning methods to obtain a higher quality input sample, which is not limited herein. In addition, for sample data of different modes, different filtering rules can be adopted for filtering, and the setting and the adjustment can be specifically performed according to the needs of actual application scenes, and the method is not particularly limited. For example, for the pictures, the pictures containing the text content which does not meet the requirements can be filtered, or the pictures with the colors which do not meet the preset requirements can be filtered, wherein the pictures which do not meet the text content and do not meet the preset requirements can be configured according to the actual application scene, and the method is not limited specifically.
In an alternative embodiment, the sampling weights of the various types of input samples may be configured. The sampling weights of the different types of input samples may be determined according to the quality and scale of each type of input samples, and may be specifically configured and adjusted by a relevant technician according to the quality and/or scale of each type of input samples in combination with empirical values, which are not specifically limited herein.
For example, if the quality of an input sample of a certain type is higher, a larger sampling weight is set for the input sample of the certain type, and if the quality of an input sample of a certain type is lower, a smaller sampling weight is set for the input sample of the certain type, so that the quality of the input samples participating in the continuous pre-training can be improved. If the scale of an input sample of a certain type is larger, a smaller sampling weight is set for the input sample of the certain type, and if the scale of the input sample of the certain type is smaller, a larger sampling weight is set for the input sample of the certain type, so that the diversity of the input samples participating in the continuous pre-training can be improved.
In the process of continuously pre-training the pre-training model by using input samples of various types in the vertical field, sampling in batches based on sampling weights of input samples of different types, and performing iterative training on the pre-training model by using the input samples obtained by sampling. The greater the sampling weight is, the greater the possibility that the input sample is sampled to participate in the continuous pre-training is, the smaller the sampling weight is, the smaller the possibility that the input sample is sampled to participate in the continuous pre-training is, the quality and the diversity of the input sample used in the continuous pre-training can be improved, and the model training effect is improved.
In an alternative embodiment, taking the pre-training model as a pre-training language model, in the continuous pre-training stage, performing a causal language modeling pre-training task (such as a forward causal language modeling pre-training task or an inverse causal language modeling pre-training task) on the pre-training model by using mixed pre-training data of a vertical domain and a general domain, so as to realize continuous pre-training on the pre-training model to obtain a pre-trained domain base model, which is shown in fig. 4 as an example, the specific steps of constructing the pre-training data of the vertical domain are as follows:
step S41, collecting various types of text data in the vertical field.
In this step, a large amount of text data in the vertical domain is collected according to the vertical domain of the model application, and specifically, one or more types of unlabeled text data in the vertical domain may be collected, including but not limited to: news in the vertical field, books, product information, user interaction data, promotional literature, user generated content (User Generated Content, abbreviated UGC). The product information comprises relevant knowledge information of various products in the vertical field, such as information of attributes, using methods, purposes, function descriptions and the like of the various products. The user interaction behavior data refers to data generated based on interaction behavior with a user, such as search behavior data, dialogue data, and the like. User-generated content, i.e., user-authored content, includes text posted by a user, such as comments, dynamic, articles, news stories, and the like.
In addition, when collecting text data of various types in the vertical field, the type of language used by the text data may not be limited, and text data of various languages including but not limited to text data of various languages such as chinese, english, etc. may be collected.
And step S42, data cleaning is carried out on the text data of various types to obtain text samples of various types in the vertical field.
In the step, data cleaning is carried out on various types of text data in the collected vertical field so as to obtain text data with higher quality as a text sample, thereby ensuring the quality of the text sample.
Specifically, the data cleaning is performed on the text data of multiple types, and at least the filtering and the de-duplication are included on the text data. In addition, the quality of the text data can be improved by other commonly used data cleaning methods to obtain a text sample with higher quality, and the embodiment is not limited herein.
For example, the server may filter various types of text data to filter out undesirable low quality text data. Specific examples include, but are not limited to: text with too short or too long length, text with too high number ratio, text with a large number of special symbols, text with abnormal punctuation marks and sentence breaking are filtered from various types of text data.
Optionally, the server may filter each type of text data according to a preset filtering rule, so as to filter out text data with low quality, which does not meet the requirement. The filtering rules may be configured and adjusted according to the needs of the actual scene to be used, which is not specifically limited herein.
For example, one example of a filtering rule may be: the length is greater than the upper length limit or the length is less than the lower length limit. Text data of too short or too long length is filtered out by filtering out text data that satisfies the filtering rule. The specific values of the upper length limit and the lower length limit may be configured and adjusted according to the needs and experience values of the actual application scenario, which are not specifically limited herein.
For example, another example of a filtering rule may be: the digital duty cycle is greater than a preset digital duty cycle threshold. Text data with a too high number duty cycle is filtered out by filtering out text data that meets the filtering rule. The preset digital duty ratio threshold value can be configured and adjusted according to the needs and experience values of the actual application scene, and is not particularly limited herein.
For example, defining a special symbol set, another example of a filtering rule may be: the number of special symbols included in the special symbol set is greater than or equal to a preset number threshold. Text containing a large number of special symbols is filtered out by filtering out text data that satisfies the filtering rule. In addition, a filtering rule may be set as follows: the duty cycle of the special symbol is greater than or equal to a preset symbol duty cycle threshold. Filtering text containing a large number of special symbols can also be achieved by the filtering rules. The special symbol set, the preset number threshold value and the preset symbol duty ratio threshold value can be configured and adjusted according to the needs and experience values of the actual application scene, and are not particularly limited herein.
Alternatively, the server may train a text classification model according to a preset filtering rule, where the text classification model is used to identify a category corresponding to the text data. The categories include: the filtering rules are not satisfied, and the categories corresponding to the filtering rules are not satisfied. For example, if m (m is a positive integer) filtering rules are set, m+1 different categories may be set, including a category that does not satisfy the filtering rules, and a category to which the m filtering rules respectively correspond. The server trains a text classification model for classifying the text data, the text classification model being used to predict a class (one of m+1 classes) to which the input text data corresponds. Through inputting each text data into the text classification model, the category corresponding to the text data can be identified through the text classification module, and according to the category corresponding to the text data, if the text data corresponds to the category corresponding to any filtering rule, namely the text data meets any filtering rule, the text data is filtered.
For example, if the text data is classified and identified by the text classification model, the text is filtered if the text data is determined to satisfy the filtering rule including sentence breaking and punctuation abnormality according to the classification and identification result. Text data containing sentence breaks and punctuation anomalies can thus be filtered.
Optionally, the data cleaning is performed on a plurality of types of text data, and further includes: invisible characters and invalid texts in each text data are removed. The invisible characters are also called control characters or non-printing characters, and are characters which cannot be displayed in the text, such as space, tab, carriage return, line feed and the like. Invalid text refers to preconfigured content that does not belong to the original content of the text and may affect the natural language understanding of the text. For example, a repeated prefix added at the beginning of the text, such as the file name of the source file, etc. For example, the invisible character set and the invalid text set may be predefined, and contents appearing in the invisible character set and the invalid text set in the text data may be removed. The invisible character set and the invalid text set specifically contain what content, and can be configured and adjusted according to the needs and experience of the actual application scene, which is not specifically limited herein.
Since repeated data may affect the model training process, in this embodiment, when data cleaning is performed on each type of text data, duplicate removal is performed on each type of text data. The method can be realized in the following way: generating text fragments with preset lengths contained in each text data, wherein the lengths of the text fragments refer to the number of words (also called word segmentation and token) contained in the text fragments; calculating abstract information of text fragments of each text data; and determining repeated text data according to the coincidence degree of the abstract information of the text fragments of different text data, and performing repeated processing on the repeated text data.
The text segment with the preset length included in any text data may be a text segment with a length n (i.e., n-gram) generated by an n-gram algorithm, where each text segment includes segments formed by n continuous words in the text data, where n is the preset length of the text segment, and may be configured according to the needs of an actual application scenario, for example, n may be 1, 2, or 3 … …, which is not specifically limited herein. The summary information of each text segment of the text data is calculated through a summary algorithm, and a set of summary information of a plurality of text segments of the text data can be obtained. And determining the text data with higher overlapping degree as repeated text data as the similarity among different text data according to the overlapping degree of the text segment summary information sets of the different text data. The Digest Algorithm may be a secure hash Algorithm (Secure Hash Algorithm, abbreviated as SHA), an MD5 Message Digest Algorithm (MD 5 Message-Digest Algorithm, abbreviated as MD 5), or the like.
For example, for the sets a and B of the text segment summary information of any two text data T1 and T2, assuming that the text data T1 contains s1 text segments and the text data T2 contains s2 text segments, the set a contains the summary information of s1 text segments and the set B contains the summary information of s2 text segments. Let the number of pieces of identical digest information contained in the set A and the set B be s, min { s1, s2 }. Ltoreq.s.ltoreq.max { s1, s2}, where min { s1, s2} is the minimum value of s1 and s2, and max { s1, s2} is the maximum value of s1 and s 2. The degree of overlap of set A and set B may be determined as follows: s/(s1+s2-s). When the overlapping degree of the set A and the set B is larger than or equal to the overlapping degree threshold value, the text data T1 and T2 are determined to be repeated text data. The overlap ratio threshold value can be configured and adjusted according to the needs and experience values of the actual application scene, and is not particularly limited herein.
Alternatively, the degree of coincidence of the set a and the set B may also be calculated as follows: s/(s1+s2), different coincidence threshold values can be set for different coincidence degree calculation modes. Optionally, duplicate text data may be determined by comparing whether the two text data are identical. In addition, the repeated text data in the collected various types of text data in the vertical field can also be determined by other existing methods for judging whether the two different texts belong to the repeated text of the content, and the embodiment is not limited specifically herein.
Alternatively, for text data determined to be duplicate, one of the text data reservations may be randomly selected, and other text data duplicate the reserved text data may be deleted. Optionally, when duplicate text data is removed, a duplicate relation graph may be further constructed according to a duplicate relation between the text data, where each text data corresponds to a node in the graph, and an edge between two nodes indicates that the text data corresponding to the two nodes is duplicate text data. By identifying connected blocks (nodes containing text data which are repeated mutually) in the repeated relation diagram, for each connected block in the repeated relation diagram, only one node selected randomly is reserved, other nodes in the connected block are deleted, the nodes are deleted, and text data corresponding to the nodes are deleted at the same time, so that only one text is reserved for a plurality of text data which are repeated mutually, and all other texts are deleted.
Step S43, splicing text samples of the same type, splitting a splicing result into a plurality of input samples based on a preset maximum input length, and constructing input sample sets of all types, wherein the pre-training data of the vertical field comprises input sample sets of all types in the vertical field.
After data cleaning is carried out on various types of text data in the vertical field, each text data after cleaning is used as a text sample for continuous pre-training.
In the step, for any type of text sample, all the text samples of the type are spliced, for the splicing result, the splicing result is segmented into a plurality of segments according to the preset maximum input length of the model, and each segment is used as an input sample to construct an input sample set of the type. Wherein the length of each input sample cut is equal to or less than a preset maximum input length. The preset maximum input length may be set according to a maximum input length acceptable by the model. The input samples are segmented according to the preset maximum input length of the model by splicing a plurality of text samples of the same type, and the efficiency of continuously pre-training the pre-training model can be improved by increasing the length of the input samples in the pre-training data.
Alternatively, for any type of text sample, when all text samples of that type are spliced, an end symbol (EOS) may be added after each text sample, and the text samples with the end symbol added are spliced, thereby separating the different text samples using the end symbol. When executing a causal language modeling pre-training task, end symbols (EOS) in an input sample are used as a word segmentation process, a first word X1 is predicted and output according to beginning symbols (BOS) in the training sample through a pre-training model, a second word X2 is predicted and output according to the BOS and the first word X1, a third word X3 is predicted and output according to the BOS, the first word X1 and the second word X2, and so on until a last word is output according to the BOS and each word before.
In this embodiment, through steps 41-S43, multiple types of text data in the vertical field are collected, and data cleaning is performed on the multiple types of text data to obtain multiple types of higher quality text samples in the vertical field, further, multiple text samples of the same type are spliced, and input samples are segmented according to a preset maximum input length of a model, so that the length of the input samples in the pre-training data can be increased, and further the efficiency of continuously pre-training the pre-training model is improved.
Step S44, obtaining the pre-training data of the general neighborhood.
Alternatively, the generic domain pre-training data may directly employ training data used in the pre-training phase of the pre-training model. Optionally, the server may also construct the pre-training data of the universal neighborhood by collecting various types of text data in the universal domain, and based on the various types of text data in the universal domain, in a similar manner to the foregoing steps S42-S43, and the specific implementation manner refers to the content of the foregoing steps S42-S43, which is not repeated here.
Wherein, collect various types of text data in the general field, can be specifically through collecting various types of unlabeled text data in the general field, including but not limited to: encyclopedia knowledge, books, blogs, news, question-answer data, and other various types/sources of text. When collecting text data of various types in the general field, the language type used by the text data is not limited, and the text data of various languages in the general field including but not limited to text data of various languages such as Chinese, english and the like can be collected.
Step S45, mixing the pre-training data in the vertical field with the pre-training data in the general neighborhood according to a preset mixing mode to obtain mixed pre-training data.
In this embodiment, the preset mixing manner is to mix the pre-training data of the vertical domain with the pre-training data of the universal neighborhood according to the preset mixing ratio. The preset mixing proportion is the mixing proportion of pre-training data in the vertical field and pre-training data in the general neighborhood, which are configured in advance.
In constructing the hybrid pre-training data for the vertical domain and the general domain, the duty cycle of the pre-training data for the vertical domain cannot be too small nor too large. If the duty ratio of the pre-training data in the vertical field is too small, the knowledge of the model learning in the vertical field is insufficient, and the text understanding and generating capacity of the model in the vertical field is affected. If the duty ratio of the pre-training data in the vertical field is too large, the model cannot better maintain the original strong semantic understanding and representing capability of the pre-training model, and the text understanding and generating capability of the model in the vertical field is also affected. The ratio of the pre-training data in the vertical field is about 1/2, namely the vertical field and the general field are mixed in a ratio of about 1:1, the obtained mixed pre-training data continuously pre-trains the pre-training model, the model can be familiar with the characteristics of text format and data distribution in the vertical field on the basis of keeping the original strong semantic understanding and representing capability of the pre-training model as much as possible, and more knowledge in the vertical field is injected into the model, so that the text understanding and generating capability of the model in the vertical field is enhanced. In this embodiment, a better training effect can be obtained when the preset mixing ratio is 1:1 or close to 1:1, and specific values can be configured and adjusted according to the needs of actual application scenes, which is not particularly limited herein. For example, the preset mixing ratio may be configured to be 1:1, 2:3, 2:1, etc.
In an optional embodiment, when the data set of the vertical domain is used to perform fine tuning training on the domain base model in step S32, any manner of performing fine tuning training on the pre-training model by using the labeled data of the vertical domain in the prior art may be adopted, which is not described herein.
In another alternative embodiment, in step S32, when performing fine-tuning training on the domain base model using the data set of the vertical domain, the server may perform fine-tuning training on the domain base model using the constructed multitasking instruction data set of the vertical domain, to obtain a domain model applicable to the vertical domain. The multi-task instruction data set comprises instruction data of various different tasks, wherein the instruction data refers to data of task instructions used for generating an input model in a fine-tuning training stage, and the data comprise task prompt format information, input information and output results. The task prompt format information used by different tasks is different. The task prompt format information is used for indicating the processing procedure of the model executing the corresponding task to obtain a task processing result. Specifically, as shown in fig. 5, the server may construct a multitasking instruction data set for the vertical domain by the following steps S51-S53:
Step S51, acquiring instruction data of various existing tasks in the vertical field, wherein the instruction data comprises task prompt format information, input information and output results, and the task prompt format information used by different tasks is different.
In this step, the labeled training data for the existing tasks in the vertical domain may be specifically obtained by collecting the fine tuning data set used for fine tuning training (with the purpose of obtaining a model suitable for a specific task) of the pre-training model for various tasks disclosed in the vertical domain. The task with the built fine-tuning dataset is referred to as an existing task in this embodiment.
Further, aiming at marked training data of various existing tasks in the vertical field, unifying formats of instruction data of various tasks comprises: task prompt format information, input information and output results. The task prompt format information is used for indicating a processing process of the model executing the corresponding task to obtain a task processing result. The input information comprises one or more inputs of the task, and the output result is a marked task processing result.
The task prompt format information of various tasks at least comprises the following information: hints (Prompt), input items (Input), and Output items (Output). Wherein the prompt indicates a task demand/purpose for the task. The entry indicates which items of information the input information contains. The output item indicates the information item that needs model generation, i.e., the information item that the task processing result contains. In addition, the task prompt format information of some tasks can also contain preset example information, and the example information can contain example information of input items, example information of target items and the like, so that a referenceable example is provided for the model.
By way of example, common natural language processing tasks can be divided into three general categories: information extraction task, text classification task and text generation task. The information extraction task includes named entity identification (detecting named entities contained in a given text and giving classification categories of the named entities), entity detection (detecting the named entities contained in the given text), attribute detection and attribute extraction tasks (detecting information describing attributes of the entities contained in the given text, such as places of products, for example), question-answer tasks related to the given entities (such as products, for example), and the like. Text classification tasks include entity classification tasks (given entity-related text, such as entity names, attributes, descriptive text, etc., identifying classification categories of entities), emotion classification tasks (emotion classification of given text, such as emotion classification of user comments, etc.), dialog intention classification tasks (identifying which intention a user has based on dialog content), and the like. The text generation task includes title generation (such as text corresponding to the commodity according to the attribute, description information and the like of the commodity), text generation (such as propaganda text of the commodity according to the attribute, description information and the like of the commodity), dialogue reply generation (reply generation according to the input problem), user query rewrite (rewrite of user query information in the search system) and the like.
In this embodiment, labeled training data of various natural language processing tasks existing in the vertical field can be collected, and instruction data of the existing tasks can be constructed based on a format of unified instruction data.
In an alternative embodiment, the prompt format information of the task can be configured based on a thinking chain (CoT) and combined with the input item, the output item and the intermediate reasoning step, so that not only is the final output item introduced into the prompt, but also the intermediate reasoning step is introduced into the prompt, the quality of the task processing result can be effectively improved, and the user is helped to understand the task processing result.
For example, taking the commodity comparison task in the e-commerce field as an example, the thinking chain-based method may be configured with the following task prompt format information:
prompting: the commodity comparison function is executed, the characteristics, the same points and different points of the two commodities are summarized based on the titles and the attributes of the two commodities, the advantages of the two commodities are described by the understanding of consumers, and finally recommendation suggestions are given according to different applicable scenes.
The input items are as follows:
commodity 1 { title, attribute, price of Commodity 1 })
Commodity 2 { title, attribute, price of Commodity 1 })
The output items and output formats are as follows:
Characteristics of commodity 1: { characteristics of commodity 1 }.
Characteristics of commodity 2: { features of commodity 2 }.
The same points: { the same point of two commodities }.
The difference is that: { different points of two commodities }.
The advantage of commodity 1 is { advantage of commodity 1 }.
The advantage of commodity 2 is { the advantage of commodity 2 }.
Recommendation advice: { shopping advice according to different applicable scenes }.
Based on the task prompt format information, the prompt (prompt) comprises an intermediate reasoning process for comparing the commodities by the model. The input item indicates that information of the commodity 1 and the commodity 2 to be compared needs to be input, and in this example, the input commodity information including a title, an attribute, and a price is schematically illustrated. The output format indicates that the output items comprise the characteristics of the commodity 1, the characteristics of the commodity 2, the same points of the two commodities, different points of the two commodities, the advantages of the commodity 1, the advantages of the commodity 2 and the recommended suggestion. In this way, the task processing result given by the model not only comprises the final recommended suggestion, but also comprises an intermediate comparison process for comparing two commodities. Further, based on the task processing result output by the model, a corresponding part in the task processing result is extracted through a regular expression method and the like and displayed to a user.
Step S52, aiming at the instruction data of at least one existing task, generating the instruction data of a new task according to the corresponding task transformation rule.
After the instruction data of the existing tasks are constructed, considering that the number of the existing tasks with the marked training data in the vertical field is possibly smaller, in the step, new tasks are obtained by transforming at least one of the existing tasks, and the instruction data of the new tasks are obtained by transforming the instruction data of at least one of the existing tasks, so that the task types are increased by a data enhancement mode, more task instruction data are obtained, and the diversity and the data quantity of the multitasking instruction data set in the vertical field are improved.
Optionally, at least one task simplification rule corresponding to the existing task may be preconfigured, and new tasks with different difficulties are constructed by adjusting information input and/or output by the existing task based on the task simplification rule. In this step, the server may simplify input information and/or output results in the instruction data of the existing task according to a task simplification rule corresponding to the existing task based on at least one instruction data of the existing task, to obtain input information and output results of the new task, and configure task prompt format information of the new task.
For example, the task of identifying a named entity is to identify the named entity contained in the input text according to the input text, and give a classification category (i.e. entity type) corresponding to the named entity. The named entity recognition task is simplified to an entity detection task by adapting the output of the named entity recognition task to include only named entities contained in the input text. The named entity recognition task is simplified into an entity classification task by adjusting the input of the named entity recognition task into the name of the named entity and the output into the classification category corresponding to the named entity.
The task simplification rule of the existing task and the task prompt format information of the new task can be configured by related technicians according to the existing task and the new task of the actual application scene, and are not particularly limited herein.
Alternatively, at least one task inversion rule corresponding to an existing task may be preconfigured, and a new task may be constructed by inverting input and output of the existing task based on the task inversion rule. In this step, the server may construct input information and output result of the new task based on instruction data of at least one existing task, take output result of the existing task as input information of the new task according to task inversion rule corresponding to the existing task, take input information of the existing task as output result of the new task, and configure task prompt format information of the new task.
For example, for questions and answers to an existing question-and-answer task, a new question-and-answer task may be constructed by reversing the questions and answers to the existing question-and-answer task as answers and questions to the new task, respectively. For example, the existing task is generated based on the commodity title, and the description text is output as the commodity title. By reversing the input and output of the existing task, a new task is constructed that generates a commodity description text based on the commodity title. The input of the new task is commodity title, and the output is descriptive text of commodity.
The task inversion rule of the existing task and the task prompt format information of the new task may be configured by related technicians according to the existing task and the new task of the actual application scenario, which are not limited herein.
Optionally, at least one task reorganization rule corresponding to the existing task may be preconfigured, and based on the task reorganization rule, a new task is constructed by splitting and reorganizing the input and output of the existing task. In this step, the server may split and reorganize the input information and the output result of the existing task according to the task reorganization rule corresponding to the existing task based on at least one instruction data of the existing task, construct the input information and the output result of the new task, and configure the task prompt format information of the new task.
For example, the existing task may be a commodity matching task, input as the title and attribute of two commodities, and output as a matching result of whether the two commodities are the same commodity. By splitting and recombining the input and output of the existing tasks, a new task with commodity title-attribute matching can be constructed: inputting a title of a given commodity, and two attributes; the output is one attribute that matches the given merchandise title.
The task reorganization rule of the existing task and the task prompt format information of the new task can be configured by related technicians according to the existing task and the new task of the actual application scene, and are not limited in detail herein.
Optionally, for at least one instruction data of the existing task, a large model can be used to execute corresponding task processing according to the task prompt format information and the input information in the instruction data, so as to obtain a task processing result, and the task processing result is used as an output result of the label to construct new instruction data, so that the data volume of the instruction data can be increased.
Step S53, based on the instruction data of the existing task and the new task, constructing a multitasking instruction data set in the vertical field.
Based on the instruction data of the existing task constructed in step S51 and the instruction data of the new task constructed in step S52, a multitasking instruction data set of the vertical domain is constructed. The multitasking instruction data set in the vertical domain contains both the instruction data of the existing task constructed in step S51 and the instruction data of the new task constructed in step S52.
According to the method, the instruction data of the new task are generated according to the corresponding task transformation rules by acquiring the instruction data of a plurality of existing tasks in the vertical field and aiming at the instruction data of at least one existing task, the scale of the instruction data for fine tuning training and the diversity of the types of the coverage tasks can be increased, the effect of fine tuning training can be improved, the performance of the field model in the vertical field obtained by training and the quality of the task processing result can be improved.
Further, in step S32, the domain base model is subjected to fine tuning training using the constructed multitasking instruction data set of the vertical domain, to obtain a domain model suitable for the vertical domain.
Specifically, instruction data is sampled from a multitask instruction data set in the vertical field, a task instruction which contains input information and accords with a task prompt format is generated according to task prompt format information and input information in the instruction data, task processing is executed according to the task instruction through a field basic model, and a task processing result is obtained; according to the task processing result and the output result (marked) in the instruction data, calculating a loss function, updating parameters of the field basic model, and training the capability of the model to generate the output result according to the task prompt format information and the input information, so as to realize the iterative fine tuning training of the field basic model. The field basic model is subjected to fine tuning training through a multitask instruction data set in the vertical field, so that the performance and generalization capability of the model on various tasks in the vertical field can be improved.
The task prompt format information can be understood as a template of the task instruction, and the task instruction containing the input information can be generated according to the format requirement of the task prompt format information by substituting the input information into the template. The method for generating task instructions may take part in the method for generating model input instructions based on given prompt formats/templates and input information in the prior art, and will not be described in detail herein.
Fig. 6 is a schematic diagram illustrating fine tuning training of a multitasking instruction of a domain base model according to an embodiment of the present application. As shown in fig. 6, for example, fine-tuning training is performed based on instruction data sets of two tasks, namely a named entity recognition task and a dialog intention detection task, examples of task instructions of the two tasks are given in fig. 6. The task instruction for identifying the task by the named entity comprises the following steps: the Prompt refers to a Prompt indication in the task Prompt format information, and is used for indicating a model to extract all named entities about attributes, brands, components and product types from input texts; the Input part is an Input text; output is the part that needs to be generated by the model. The task instruction is input into the domain basic model, and a named entity recognition result is generated and output through the domain basic model. The task instructions of the dialog intention detection task include: the Prompt is a Prompt instruction in the task Prompt format information, and is used for indicating that the model classifies according to the intention of the last sentence in the input dialogue text, and the output result is selected from candidate intention categories; the Input part is the Input dialogue text; the Candidate Labels portion is a Candidate intent category; output is the part that needs to be generated by the model. By inputting task instructions into the domain base model, a dialog intention detection result is generated and output through the domain base model. The domain basic model is subjected to fine tuning training through instruction data sets of various different tasks, and the trained domain model can be applied to various different tasks in the vertical domain, so that the model is a multi-task universal model.
Fig. 7 is a frame diagram of a domain model for acquiring a vertical domain based on a pre-training model according to an exemplary embodiment of the present application. In this embodiment, as shown in fig. 7, a vertical neighborhood is taken as an e-commerce domain, a domain large language model in the e-commerce domain is obtained for exemplary illustration, based on a general pre-training large language model, pre-training data without labels in the e-commerce domain and pre-training data without labels in the general domain are used together to continuously pre-train the pre-training large language model, a basic large language model in the e-commerce domain is obtained, and on the basis of keeping the original strong semantic understanding and representing capability of the pre-training large language model as much as possible, the model is familiar with the characteristics of text format and data distribution in the e-commerce domain, and more knowledge in the e-commerce domain is injected into the model, so that the text understanding and generating capability of the model in the e-commerce domain is enhanced. Further, a fine tuning training set of instruction data of various tasks (such as an information extraction task, a text classification task, a text generation task and the like) in the electronic commerce field is used, fine tuning training of multi-task instruction data is carried out on a basic large language model in the electronic commerce field, and a large language model in the electronic commerce field is obtained, so that the finally obtained large language model in the electronic commerce field can be better suitable for a natural language processing task in the electronic commerce field, the performance of the model is further improved, and the generation quality of the large language model in executing various NLP tasks in the electronic commerce field is greatly improved.
Fig. 8 is a flowchart of a task processing method in the e-commerce field according to an exemplary embodiment of the present application. The execution body of the embodiment is a server in the system architecture shown in fig. 1 or fig. 2. The method of the embodiment aims at continuously pre-training and fine-tuning training the pre-training large model to obtain the large model of the electronic commerce field, which is applicable to the electronic commerce field. As shown in fig. 8, the method specifically comprises the following steps:
and S81, continuously pre-training the pre-training large model by using mixed pre-training data of the e-commerce field and the general field to obtain a basic large model of the pre-training e-commerce field.
The pre-training large model can be a large model applied to natural language processing, such as various pre-training large language models. In the step, the E-commerce field is used as the vertical field, the mixed pre-training data of the E-commerce field and the general field is used for continuously pre-training the pre-training large model, so that the large model is familiar with the characteristics of text formats and data distribution of the E-commerce field, and knowledge of more vertical fields is injected into the large model, and the text understanding and generating capacity of the large model in the E-commerce field is enhanced.
Specifically, pre-training data in the e-commerce field is built, and the pre-training data in the e-commerce field and the pre-training data in the general neighborhood are mixed according to a preset mixing mode, so that mixed pre-training data are obtained.
The specific implementation manner of this step is similar to that of step S31, and the details of this step are referred to in the foregoing embodiment, and are not described herein.
And S82, performing fine tuning training on the basic large model in the E-commerce field by using the data set in the E-commerce field to obtain the large model in the E-commerce field, wherein the large model in the E-commerce field is used for executing natural language processing tasks in the E-commerce field and generating corresponding task processing results.
After the mixed pre-training data of the e-commerce field and the general field are used for continuously pre-training the pre-training large model to obtain a basic large model of the pre-trained e-commerce field, the data set of the e-commerce field can be used for carrying out fine-tuning training on the basic large model to obtain the e-commerce field large model suitable for the e-commerce field, and the finally obtained e-commerce field large model can be better suitable for natural language processing tasks of the e-commerce field through the fine-tuning training on the basic large model, so that the performance of the large model is further improved. The specific implementation manner of this step is similar to that of the foregoing step S32, and the relevant content in the foregoing embodiment is specifically referred to and will not be repeated here.
According to the method, the pre-training large model is continuously pre-trained by using mixed pre-training data of the electronic commerce field and the general field, so that a basic large model of the pre-training electronic commerce field is obtained, the characteristics of text format and data distribution of the electronic commerce field are familiar to the large model on the basis of keeping the original strong semantic understanding and representing capability of the pre-training large model as much as possible, and knowledge of more electronic commerce fields is injected into the large model, so that the text understanding and generating capability of the large model in the electronic commerce field is enhanced; further, fine tuning training is performed on the basic large model by using a fine tuning data set in the e-commerce field to obtain the e-commerce field large model suitable for the e-commerce field, so that the finally obtained e-commerce field large model can be better suitable for natural language processing tasks in the e-commerce field, the performance of the large model is further improved, and the generation quality of task processing results when the large model is applied to various NLP tasks in the e-commerce field is greatly improved.
Fig. 9 is a flowchart of implementing task processing based on a domain model of a vertical domain according to an exemplary embodiment of the present application. After the domain model of the vertical domain obtained by the method according to any of the foregoing method embodiments, in this embodiment, a natural language processing task in the vertical domain is executed based on the domain model of the vertical domain, and a task processing result is obtained.
It should be noted that, the domain model of the vertical domain obtained based on the training in the foregoing embodiment may be deployed locally on a server or on a server of another mechanism or system platform. The execution subject of this embodiment is a server running a domain model of a vertical domain, and also running a downstream application system that has the ability to use the domain model.
As shown in fig. 9, the specific steps for implementing task processing based on the domain model of the vertical domain are as follows:
step S91, responding to a task processing request of a natural language processing task in the vertical field, and acquiring input data and task prompt format information of the natural language processing task.
In this embodiment, when an application system running on a server needs to execute a natural language processing task in a vertical domain, a task processing request is submitted to the server, where the task processing request includes input data and task category information of the natural language processing task. The task category information of different natural language processing tasks is different.
In practical applications, the downstream application system may generate a plurality of different natural language processing tasks during the running process, including but not limited to various information extraction tasks, text classification tasks, and text generation tasks. Different natural language processing tasks have different task categories, and corresponding natural language processing tasks can be determined according to the task categories, so that task prompt format information corresponding to the natural language processing tasks is obtained.
For example, taking a downstream application system as an e-commerce system in the e-commerce field as an example, after a user inputs a consultation text during the operation of the e-commerce system, the e-commerce system may generate a commodity identification task: and executing commodity identification tasks according to the consultation texts of the users, extracting entities of various categories such as commodities, commodity attributes, brands and the like contained in the consultation texts, and determining corresponding entity categories. After the commodity identification result is obtained, the electronic commerce system searches a knowledge document matched with the current commodity according to the commodity identification result and generates a question-answering task: and generating reply information of the consultation text of the user based on the searched knowledge document. After obtaining the reply information, the e-commerce system outputs the reply information.
In addition, in other scenes in the e-commerce field and other fields, there are also cases where many application systems generate multiple different natural language processing tasks, and the solution of this embodiment uses a general field model to execute the various natural language processing tasks in the vertical field, which are not listed here one by one.
Step S92, generating a task instruction according to the task prompt format information and the input data.
After acquiring input data and task prompt format information of a natural language processing task to be executed, the server generates a task instruction containing the input data according to the task prompt format information. For example, for a commodity identification task, the input data is consultation text input by a user, the input data is substituted into an input item of task prompt format information, a task instruction of the commodity identification task is generated, and as shown in fig. 6, a task instruction of a named entity identification task can be extracted from the input text, and entities of categories of commodity, attribute, brand, component and the like can be given out corresponding entity categories.
Step S93, inputting the task instruction into the field model, executing task processing based on the task instruction through the field model, and generating a task processing result.
After a task instruction of a natural language processing task to be executed is generated, the task instruction is input into a field model, natural language processing based on the task instruction is realized through reasoning of the field model, and a task processing result is generated.
Step S94, returning a task processing result.
After generating the task processing results of the natural language processing task, the server may return the task processing results. Illustratively, the server may return the task processing results to the downstream application system and execute subsequent processing logic of the downstream application system, depending on the requirements of the downstream application system. For example, after performing a commodity identification task through the domain model and obtaining a commodity identification result, the server returns the commodity identification result to the e-commerce system. The electronic commerce system searches the knowledge document matched with the commodity according to the commodity identification result.
In this embodiment, the domain model of the vertical domain obtained by training the foregoing method embodiment is used as a general model of multiple natural language processing tasks of the vertical domain, and because the original strong semantic understanding and representing capability of the pre-training model is maintained as much as possible in the training process, knowledge of more vertical domains is injected into the domain model, the text understanding and generating capability of the domain model in the vertical domain is enhanced, and the multiple natural language processing tasks of the vertical domain including but not limited to various tasks such as named entity recognition, text classification, text generation and the like are executed by using the domain model, so that the quality of the natural language processing result can be greatly improved.
In an alternative embodiment, the server may provide an application programming interface (Application Programming Interface, API) to the vertical domain model externally after the vertical domain model obtained based on the method of any of the preceding method embodiments. The external device uses the natural language processing capability of the domain model through the API of the domain model of the vertical domain.
Fig. 10 is a flowchart of implementing task processing based on a domain model of a vertical domain according to another exemplary embodiment of the present application. In this embodiment, the domain model and the downstream application system using the natural language processing capability of the domain model are respectively run on different electronic devices. And the electronic equipment running the domain model is used as a server of a server side, and the electronic equipment running the downstream application system is used as an end side equipment. As shown in fig. 10, the specific steps of task processing for implementing a natural language processing task based on a domain model of a vertical domain are as follows:
in step S1001, in response to a task processing request of a natural language processing task in the vertical domain, the terminal device acquires input data and task prompt format information of the natural language processing task.
In this embodiment, when an application system running on an end-side device needs to execute a natural language processing task in a vertical domain, a task processing request is submitted to the end-side device, where the task processing request includes input data and task type information of the natural language processing task. The task category information of different natural language processing tasks is different.
In practical applications, the downstream application system may generate a plurality of different natural language processing tasks during the running process, including but not limited to various information extraction tasks, text classification tasks, and text generation tasks. Different natural language processing tasks have different task categories, and corresponding natural language processing tasks can be determined according to the task categories, so that task prompt format information corresponding to the natural language processing tasks is obtained.
For example, taking a downstream application system as an e-commerce system in the e-commerce field as an example, after a user inputs a consultation text during the operation of the e-commerce system, the e-commerce system may generate a commodity identification task: and executing commodity identification tasks according to the consultation texts of the users, extracting entities of various categories such as commodities, commodity attributes, brands and the like contained in the consultation texts, and determining corresponding entity categories. After the commodity identification result is obtained, the electronic commerce system searches a knowledge document matched with the current commodity according to the commodity identification result and generates a question-answering task: and generating reply information of the consultation text of the user based on the searched knowledge document. After obtaining the reply information, the e-commerce system outputs the reply information.
In addition, in other scenes in the e-commerce field and other fields, there are also cases where many application systems generate multiple different natural language processing tasks, and the solution of this embodiment uses a general field model to execute the various natural language processing tasks in the vertical field, which are not listed here one by one.
Step S1002, the terminal side device generates a task instruction according to the task prompt format information and the input data.
The specific implementation manner of step S1002 is identical to that of step S92, and details of the foregoing embodiment are not described herein.
In step S1003, the end device sends a call request to the domain model to the server, where the call request includes a task instruction.
In this embodiment, the server may provide an Application Programming Interface (API) of the domain model of the vertical domain to the outside. The end-side device may send a call request to the server through the API of the domain model of the vertical domain. The call request contains a task instruction that requires input of a domain model. The task instruction is generated according to the task prompt format information of the natural language processing task to be executed and the input data.
It should be noted that the server may obtain one or more domain models of different vertical domains in advance, where different domain models have different APIs. The server can show the APIs of the domain models of different vertical domains to the end-side device, and the end-side device can select and call different APIs according to the running application system or the vertical domain to which the current natural language processing task belongs to so as to use the domain model of the vertical domain corresponding to the APIs. In addition, in practical applications, the server may receive call requests from a plurality of different end-side devices, and provide natural language processing capabilities based on a domain model of the vertical domain to the plurality of different end-side devices.
Step S1004, the server inputs the task instruction into the field model, and performs task processing based on the task instruction through the field model to generate a task processing result.
After receiving a call request for the domain model, the server acquires a task instruction, inputs the task instruction into the domain model in the vertical domain, performs task processing based on the task instruction through the domain model, and generates a task processing result. The domain model operated by the server in this embodiment is obtained by using mixed pre-training data of a vertical domain and a general domain to continuously pre-train the pre-training model to obtain a pre-trained domain base model, and then using a fine-tuning data set of the vertical domain to perform fine-tuning training on the domain base model. The specific process of obtaining the domain model is referred to in the related content of the foregoing embodiment, and will not be described herein.
Step S1005, the server returns a task processing result to the end-side device.
In this embodiment, the domain model of the vertical domain trained in any of the above method embodiments is used as a general model of a plurality of natural language processing tasks in the vertical domain, and an API using the domain model is provided to the outward-facing device. Because the original strong semantic understanding and representing capability of the pre-training model is maintained as much as possible in the training process, more knowledge of the vertical field is injected into the field model, the text understanding and generating capability of the field model in the vertical field is enhanced, and a downstream application system uses the field model to execute various natural language processing tasks of the vertical field, including but not limited to various tasks such as named entity recognition, text classification, text generation and the like, so that the quality of natural language processing results can be greatly improved.
On the basis of any method embodiment, task prompt format information of any type of task can be configured by combining a method of a thinking chain (CoT), the task prompt format information comprises natural language description of reasoning steps, a model is instructed to gradually conduct task reasoning to obtain a final result, the final result can be output, an intermediate reasoning process can be output, the quality of the output result can be effectively prompted, and the understanding and decision cost of a user is reduced.
For example, take the commodity comparison task in the e-commerce field as an example: and (5) comparing the two commodities and giving the recommendation meeting the user requirement according to the related information of the two commodities. One chain of thought to solve this task may be: based on the information of the two commodities, the characteristics, the same points and different points of the two commodities are respectively given, the advantages of the two commodities are described, and finally recommended commodities are given according to the requirements of users. Based on the thinking chain, a task prompt format information of commodity comparison tasks can be configured as follows:
prompting: the commodity comparison function is executed, the characteristics, the same points and different points of the two commodities are summarized based on the titles and the attributes of the two commodities, the advantages of the two commodities are described by the understanding of consumers, and finally recommendation suggestions are given according to different applicable scenes.
The input items are as follows:
commodity 1 { title, attribute, price of Commodity 1 })
Commodity 2 { title, attribute, price of Commodity 1 })
The output items and output formats are as follows:
characteristics of commodity 1: { characteristics of commodity 1 }.
Characteristics of commodity 2: { features of commodity 2 }.
The same points: { the same point of two commodities }.
The difference is that: { different points of two commodities }.
The advantage of commodity 1 is { advantage of commodity 1 }.
The advantage of commodity 2 is { the advantage of commodity 2 }.
Recommendation advice: { shopping advice according to different applicable scenes }.
Based on the task prompt format information, the prompt (prompt) comprises an intermediate reasoning process for comparing the commodities by the model. The input item indicates that information of the commodity 1 and the commodity 2 to be compared needs to be input, and in this example, the input commodity information including a title, an attribute, and a price is schematically illustrated. The output format indicates that the output result includes the characteristics of the commodity 1, the characteristics of the commodity 2, the same point of the two commodities, different points of the two commodities, the advantage of the commodity 1, the advantage of the commodity 2 and the recommended suggestion. In this way, the task processing result given by the model not only comprises the final recommended suggestion, but also comprises an intermediate comparison process for comparing two commodities. Further, based on the task processing result output by the model, an output item to be displayed can be extracted from the task processing result through a preset rule and displayed to a user through a front-end interface. For example, key information in the task processing result can be extracted through a preset configuration regular expression method and the like and displayed to a user.
As shown in fig. 11, based on the task prompt format information of the commodity comparison task, a flow frame for implementing the commodity comparison task by using the domain model in the e-commerce domain is as follows: first, based on information of a given commodity 1 and commodity 2, task instructions based on a thought chain (CoT) are generated from task presentation format information configured based on the thought chain (CoT) method. And inputting the task instruction into a domain model in the E-commerce domain, and executing a gradual reasoning process based on a thinking chain (CoT) according to the task instruction based on the thinking chain through the domain model to generate a comparison result. The comparison results generated comprise the characteristics of the commodity 1, the characteristics of the commodity 2, the same points of the two commodities, different points of the two commodities, the advantages of the commodity 1, the advantages of the commodity 2 and the recommended suggestion. In addition, in other examples, the defects of two commodities and the like can be respectively given, the thinking chain of the task and the task prompt format information comprise intermediate reasoning steps in the thinking chain, and the task prompt format information can be configured by related technicians according to the requirements of actual application scenes, and is not particularly limited herein.
Fig. 12 is a flowchart of the commodity comparing method provided in this embodiment, and as shown in fig. 12, the specific steps for implementing commodity comparison based on the large language model in the e-commerce field are as follows:
Step S121, obtaining information of a plurality of commodities to be compared and task prompt format information.
The obtained information of the commodity includes key information such as title, attribute, price, usage method, application scene, product description, etc. of the commodity, and the specific key information can be configured in the input item of the task prompt format information according to the actual application scene, which is not limited herein.
The information of the commodity obtained in the step is the input item information required for executing the commodity comparison task. The task prompt format information of the commodity comparison task refers to the example of the foregoing embodiment, and is not repeated here.
Step S122, generating a task instruction according to the task prompt format information and the information of the plurality of commodities.
Step S123, inputting the task instruction into a large language model in the E-commerce field, and executing commodity comparison processing according to the task instruction through the large language model to obtain a commodity comparison output result.
And step S124, outputting an output result of commodity comparison.
For example, fig. 13 is an interface example showing the comparison result of the commodities, and as shown in fig. 13, the characteristics of each of the two commodities may be output in the interface, the advantages and disadvantages of the two commodities are summarized, and a recommendation suggestion is given in combination with the use scenario of the user. In addition, various parameters of the commodity can be output in the interface, and key parameters (such as advantage parameters) of the commodity are highlighted based on the characteristics, advantages and disadvantages of the commodity and the like.
In this embodiment, task prompt format information of various tasks is configured based on a thought chain (CoT) method, so that the task prompt format information includes natural language description of intermediate reasoning steps, and instructs a domain model to gradually perform task reasoning to obtain a final result, so that not only the final result but also an intermediate reasoning process can be output, quality of the output result can be effectively prompted, and understanding and decision cost of a user can be reduced.
Fig. 14 is a schematic structural diagram of a server according to an embodiment of the present application. As shown in fig. 14, the server includes: a memory 1401, and a processor 1402. Memory 1401 is used to store computer-executable instructions and may be configured to store various other data to support operations on a server. The processor 1402 is communicatively connected to the memory 1401, and is configured to execute computer-executable instructions stored in the memory 1401, so as to implement the technical solution provided in any one of the above method embodiments, and the specific functions and the technical effects that can be implemented are similar, and are not repeated herein.
In fig. 14, a server is illustrated as an example of a cloud server deployed in the cloud, and the server may be a local server. Optionally, as shown in fig. 14, the server further includes: firewall 1403, load balancer 1404, communication component 1405, power component 1406, and other components. Only some of the components are schematically shown in fig. 14, which does not mean that the server only includes the components shown in fig. 14.
The embodiment of the present application further provides a computer readable storage medium, in which computer executable instructions are stored, and when a processor executes the computer executable instructions, the specific functions and the technical effects that can be achieved by implementing the method of any one of the foregoing embodiments are not described herein.
Embodiments of the present application also provide a computer program product comprising a computer program which, when executed by a processor, implements the method of any of the preceding embodiments. The computer program is stored in a readable storage medium, and the computer program can be read from the readable storage medium by at least one processor of the server, where execution of the computer program by at least one processor causes the server to execute the technical solution provided in any one of the method embodiments, and specific functions and technical effects that can be achieved are not repeated herein.
The embodiment of the application provides a chip, which comprises: the processing module and the communication interface, the processing module can execute the technical scheme of the server in the foregoing method embodiment. Optionally, the chip further includes a storage module (e.g. a memory), where the storage module is configured to store the instructions, and the processing module is configured to execute the instructions stored in the storage module, and execution of the instructions stored in the storage module causes the processing module to execute the technical solution provided in any one of the foregoing method embodiments.
The integrated modules, which are implemented in the form of software functional modules, may be stored in a computer readable storage medium. The software functional modules described above are stored in a storage medium and include instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) or processor to perform some of the steps of the methods of the various embodiments of the application.
It should be appreciated that the processor may be a processing unit (Central Processing Unit, CPU for short), but may also be other general purpose processors, digital signal processors (Digital Signal Processor DSP for short), application specific integrated circuits (Application Specific Integrated Circuit ASIC for short), etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of a method disclosed in connection with the present application may be embodied directly in a hardware processor for execution, or in a combination of hardware and software modules in a processor for execution. The memory may comprise a high-speed RAM memory, and may further comprise a non-volatile memory NVM, such as at least one magnetic disk memory, and may also be a U-disk, a removable hard disk, a read-only memory, a magnetic disk or optical disk, etc.
The memory may be an object store (Object Storage Service, OSS for short). The memory may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. The communication component is configured to facilitate wired or wireless communication between the device in which the communication component is located and other devices. The device where the communication component is located may access a wireless network based on a communication standard, such as a mobile hotspot (WiFi), a mobile communication network of a second generation mobile communication system (2G), a third generation mobile communication system (3G), a fourth generation mobile communication system (4G)/Long Term Evolution (LTE), a fifth generation mobile communication system (5G), or a combination thereof. In one exemplary embodiment, the communication component receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component further includes a Near Field Communication (NFC) module to facilitate short range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, ultra Wideband (UWB) technology, bluetooth (BT) technology, and other technologies. The power supply component provides power for various components of equipment where the power supply component is located. The power components may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the devices in which the power components are located. The storage medium may be implemented by any type or combination of volatile or nonvolatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disk. A storage media may be any available media that can be accessed by a general purpose or special purpose computer.
An exemplary storage medium is coupled to the processor such the processor can read information from, and write information to, the storage medium. In the alternative, the storage medium may be integral to the processor. The processor and the storage medium may reside in an application specific integrated circuit (Application Specific Integrated Circuits, ASIC for short). It is also possible that the processor and the storage medium reside as discrete components in an electronic device or a master device.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The order of the above-described embodiments of the application is merely for illustration and does not represent the advantages or disadvantages of the embodiments. In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations appearing in a particular order are included, but it should be clearly understood that the operations may be performed out of order or performed in parallel in the order in which they appear herein, merely for distinguishing between the various operations, and the sequence number itself does not represent any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first" and "second" herein are used to distinguish different messages, devices, modules, etc., and do not represent a sequence, and are not limited to the "first" and the "second" being different types. The meaning of "a plurality of" is two or more, unless specifically defined otherwise.
From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising several instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method of the embodiments of the present application.
The foregoing description is only of the preferred embodiments of the present application, and is not intended to limit the scope of the application, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (14)

1. A method of task processing, comprising:
continuously pre-training the pre-training model by using mixed pre-training data of the vertical field and the universal field to obtain a pre-trained field basic model;
And performing fine tuning training on the domain basic model by using the data set of the vertical domain to obtain a domain model suitable for the vertical domain, wherein the domain model is used for executing natural language processing tasks of the vertical domain and generating corresponding task processing results.
2. The method of claim 1, wherein the using the mixed pre-training data of the vertical domain and the universal domain continues to pre-train the pre-training model, and further comprising, before obtaining the pre-trained domain base model:
constructing pre-training data of the vertical field;
and mixing the pre-training data in the vertical field with the pre-training data in the general neighborhood according to a preset mixing mode to obtain mixed pre-training data.
3. The method according to claim 2, wherein the mixing the pre-training data of the vertical domain with the pre-training data of the universal neighborhood according to the preset mixing manner to obtain mixed pre-training data includes:
mixing the pre-training data in the vertical field with the pre-training data in the general neighborhood according to a preset mixing proportion to obtain mixed pre-training data;
or,
the pre-training data of the vertical field comprises pre-training data of a plurality of subdivision fields, and the pre-training data of the application field corresponding to the subdivision fields and the pre-training data of the universal field are mixed according to the application field of the field model to obtain mixed pre-training data.
4. The method as recited in claim 2, further comprising:
displaying the configured mixing mode through a mixing mode configuration interface, and displaying a mixing rule corresponding to the mixing mode;
responding to the editing operation of the mixing rule corresponding to any mixing mode, obtaining a new mixing rule of the mixing mode after editing, and updating the mixing rule corresponding to the mixing mode;
in response to a selected operation on any of the mixing modes, the selected mixing mode is taken as a preset mixing mode currently used.
5. The method according to claim 1, wherein the using the mixed pre-training data of the vertical domain and the universal domain to continue pre-training the pre-training model to obtain the pre-trained domain base model comprises:
and executing a causal language modeling pre-training task on the pre-training model by using mixed pre-training data of the vertical field and the universal field, and realizing continuous pre-training on the pre-training model to obtain a pre-trained field basic model.
6. The method of claim 5, wherein performing causal language modeling pre-training tasks on the pre-training model using mixed pre-training data of a vertical domain and a universal domain to enable continued pre-training of the pre-training model to obtain a pre-trained domain base model, comprises:
The mixed pre-training data comprises a plurality of types of text samples, the text samples of the same type are spliced, the splicing result is segmented into a plurality of input samples based on a preset maximum input length, and each type of input sample set is constructed;
and executing a causal language modeling pre-training task on the pre-training model according to the input sample set of each type, and realizing continuous pre-training on the pre-training model to obtain a pre-trained field basic model.
7. The method of claim 1, wherein the performing fine-tuning training on the domain base model using the data set of the vertical domain to obtain a domain model applicable to the vertical domain comprises:
and performing fine tuning training on the domain basic model by using the constructed multi-task instruction data set in the vertical domain to obtain a domain model suitable for the vertical domain, wherein the instruction data set comprises a plurality of pieces of instruction data, the instruction data comprises task prompt format information, input information and output results, and the task prompt format information used by different tasks is different.
8. The method of claim 7, wherein constructing the vertical domain multiplexed instruction data set comprises:
Acquiring instruction data of a plurality of existing tasks in the vertical field, wherein the instruction data comprises task prompt format information, input information and output results, and the task prompt format information used by different tasks is different;
aiming at the instruction data of at least one existing task, generating instruction data of a new task according to a corresponding task transformation rule;
and constructing a multitasking instruction data set of the vertical field based on the instruction data of the existing task and the new task.
9. The method of claim 8, wherein generating instruction data for a new task according to the corresponding task transformation rules for the instruction data for at least one existing task comprises at least one of:
simplifying input information and/or output results in the instruction data of the existing task according to task simplification rules corresponding to the existing task based on instruction data of at least one existing task to obtain input information and output results of a new task, and configuring task prompt format information of the new task;
based on instruction data of at least one existing task, according to a task inversion rule corresponding to the existing task, taking an output result of the existing task as input information of a new task, taking the input information of the existing task as an output result of the new task, constructing the input information and the output result of the new task, and configuring task prompt format information of the new task;
Based on the instruction data of at least one existing task, splitting and recombining the input information and the output result of the existing task according to the task recombination rule corresponding to the existing task, constructing the input information and the output result of a new task, and configuring task prompt format information of the new task.
10. The method according to any one of claims 1-9, wherein the performing fine-tuning training on the domain base model using the data set of the vertical domain, after obtaining a domain model applicable to the vertical domain, further comprises:
responding to a task processing request of a natural language processing task in the vertical field, and acquiring input data and task prompt format information of the natural language processing task;
generating a task instruction according to the task prompt format information and the input data;
inputting the task instruction into the domain model, executing task processing based on the task instruction through the domain model, and generating a task processing result;
and outputting the task processing result.
11. A method of task processing, comprising:
using mixed pre-training data of the e-commerce field and the general field to continuously pre-train the pre-training large model to obtain a basic large model of the pre-trained e-commerce field;
And performing fine tuning training on the basic large model in the E-commerce field by using the data set in the E-commerce field to obtain the large model in the E-commerce field, wherein the large model in the E-commerce field is used for executing natural language processing tasks in the E-commerce field and generating corresponding task processing results.
12. The method of claim 11, wherein the using the mixed pre-training data of the e-commerce domain and the general domain continues pre-training the pre-training large model, and further comprising, before obtaining the basic large model of the pre-trained e-commerce domain:
constructing pre-training data in the e-commerce field;
and mixing the pre-training data in the e-commerce field with the pre-training data in the general neighborhood according to a preset mixing mode to obtain mixed pre-training data.
13. A task processing method, applied to a server, comprising:
receiving a call request for a domain model sent by a terminal side device, wherein the call request comprises a task instruction, and the task instruction is generated according to task prompt format information and input data of a natural language processing task to be executed;
inputting the task instruction into the domain model, executing task processing based on the task instruction through the domain model, and generating a task processing result, wherein the domain model is obtained by using mixed pre-training data of a vertical domain and a general domain, continuously pre-training the pre-training model to obtain a pre-trained domain basic model, and performing fine-tuning training on the domain basic model by using a data set of the vertical domain;
And returning the task processing result to the end-side equipment.
14. A server, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor;
wherein the memory stores instructions executable by the at least one processor to cause the server to perform the method of any one of claims 1-13.
CN202311205678.9A 2023-09-18 2023-09-18 Task processing method and server Pending CN117171325A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311205678.9A CN117171325A (en) 2023-09-18 2023-09-18 Task processing method and server

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311205678.9A CN117171325A (en) 2023-09-18 2023-09-18 Task processing method and server

Publications (1)

Publication Number Publication Date
CN117171325A true CN117171325A (en) 2023-12-05

Family

ID=88935356

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311205678.9A Pending CN117171325A (en) 2023-09-18 2023-09-18 Task processing method and server

Country Status (1)

Country Link
CN (1) CN117171325A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117390142A (en) * 2023-12-12 2024-01-12 浙江口碑网络技术有限公司 Training method and device for large language model in vertical field, storage medium and equipment
CN118013021A (en) * 2024-04-08 2024-05-10 浙江口碑网络技术有限公司 Medicine answering method, device, equipment and medium based on large language model

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117390142A (en) * 2023-12-12 2024-01-12 浙江口碑网络技术有限公司 Training method and device for large language model in vertical field, storage medium and equipment
CN117390142B (en) * 2023-12-12 2024-03-12 浙江口碑网络技术有限公司 Training method and device for large language model in vertical field, storage medium and equipment
CN118013021A (en) * 2024-04-08 2024-05-10 浙江口碑网络技术有限公司 Medicine answering method, device, equipment and medium based on large language model

Similar Documents

Publication Publication Date Title
CN107679039B (en) Method and device for determining statement intention
US20210232761A1 (en) Methods and systems for improving machine learning performance
CN111026842B (en) Natural language processing method, natural language processing device and intelligent question-answering system
US10831796B2 (en) Tone optimization for digital content
CN109284399B (en) Similarity prediction model training method and device and computer readable storage medium
CN117171325A (en) Task processing method and server
CN111708869B (en) Processing method and device for man-machine conversation
US11651015B2 (en) Method and apparatus for presenting information
CN111581360A (en) Method, system and equipment for assisting customer service
CN110598095A (en) Method, device and storage medium for identifying article containing designated information
CN112528654A (en) Natural language processing method and device and electronic equipment
CN116244412A (en) Multi-intention recognition method and device
CN115374259A (en) Question and answer data mining method and device and electronic equipment
Devi et al. ChatGPT: Comprehensive Study On Generative AI Tool
CN116955591A (en) Recommendation language generation method, related device and medium for content recommendation
CN116881462A (en) Text data processing, text representation and text clustering method and equipment
CN116522905A (en) Text error correction method, apparatus, device, readable storage medium, and program product
CN115905472A (en) Business opportunity service processing method, business opportunity service processing device, business opportunity service processing server and computer readable storage medium
CN112632962B (en) Method and device for realizing natural language understanding in man-machine interaction system
Martina et al. A virtual assistant for the movie domain exploiting natural language preference elicitation strategies
CN117851577B (en) Government service question-answering method based on knowledge graph enhanced large language model
CN117149957B (en) Text processing method, device, equipment and medium
US11770352B2 (en) Method and apparatus for providing chat service including expression items
CN114385903B (en) Application account identification method and device, electronic equipment and readable storage medium
US20230196105A1 (en) Generating labeled training data using a pre-trained language model neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination