CN117350407A

CN117350407A - Model processing method, device, electronic equipment and readable storage medium

Info

Publication number: CN117350407A
Application number: CN202311549066.1A
Authority: CN
Inventors: 于皓; 张�杰
Original assignee: Beijing Zhongguancun Kejin Technology Co Ltd
Current assignee: Beijing Zhongguancun Kejin Technology Co Ltd
Priority date: 2023-11-20
Filing date: 2023-11-20
Publication date: 2024-01-05

Abstract

The application discloses a model processing method, a model processing device, electronic equipment and a readable storage medium, and belongs to the technical field of artificial intelligence. The model processing method in the embodiment of the application comprises the following steps: acquiring training data; generating first campt data and second campt data according to the training data and a pre-established campt template; the first campt data is related to a first task, and the first task at least comprises a target atomic task; the second prompt data is related to a second task, the second task comprises the target atomic task, and the number of the atomic tasks included in the second task is more than that of the atomic tasks included in the first task; training a pre-training model by using the first prompt data to obtain an initial task execution model; and adjusting the initial task execution model by using the second prompt data to obtain a target task execution model. Therefore, the task execution capacity of the model can be effectively improved.

Description

Model processing method, device, electronic equipment and readable storage medium

Technical Field

The application belongs to the technical field of artificial intelligence, and particularly relates to a model processing method, a model processing device, electronic equipment and a readable storage medium.

Background

In the related art, in order to improve the execution capability of a task execution model in the face of a complex task, a new task, and the like, a task execution model is generally trained by increasing the model size and/or increasing the amount of training data. However, the increase in model size and the increase in training data is generally limited due to training costs and the like, which may result in an inability to effectively enhance the task performance of the model.

Disclosure of Invention

An objective of the embodiments of the present application is to provide a method, an apparatus, an electronic device, and a readable storage medium for processing a model, so as to solve the problem that the task execution capability of the model cannot be effectively improved at present.

In order to solve the technical problems, the application is realized as follows:

in a first aspect, a method for processing a model is provided, including:

acquiring training data;

generating first campt data and second campt data according to the training data and a pre-established campt template; wherein the first campt data is related to a first task, and the first task at least comprises a target atomic task; the second prompt data is related to a second task, the second task comprises the target atomic task, and the number of the atomic tasks included in the second task is more than the number of the atomic tasks included in the first task;

training a pre-training model by using the first prompt data to obtain an initial task execution model;

and adjusting the initial task execution model by using the second prompt data to obtain a target task execution model.

In a second aspect, there is provided a model processing apparatus including:

the acquisition module is used for training data;

the generation module is used for generating first campt data and second campt data according to the training data and a pre-established campt template; wherein the first campt data is related to a first task, and the first task at least comprises a target atomic task; the second prompt data is related to a second task, the second task comprises the target atomic task, and the number of the atomic tasks included in the second task is more than the number of the atomic tasks included in the first task;

the training module is used for training the pre-training model by utilizing the first prompt data to obtain an initial task execution model;

and the adjusting module is used for adjusting the initial task execution model by utilizing the second prompt data to obtain a target task execution model.

In a third aspect, there is provided an electronic device comprising a processor, a memory and a program or instruction stored on the memory and executable on the processor, the program or instruction when executed by the processor implementing the steps of the method according to the first aspect.

In a fourth aspect, there is provided a readable storage medium having stored thereon a program or instructions which when executed by a processor perform the steps of the method according to the first aspect.

In the embodiment of the application, after the training data is acquired, generating the first campt data and the second campt data according to the training data and a pre-established campt template; the first campt data is related to a first task, and the first task at least comprises a target atomic task; the second prompt data is related to a second task, the second task comprises the target atomic task, and the number of the atomic tasks included in the second task is more than the number of the atomic tasks included in the first task; training a pre-training model by using the first prompt data to obtain an initial task execution model; and adjusting the initial task execution model by using the second prompt data to obtain a target task execution model. Therefore, the task execution model can be gradually adjusted/optimized according to the complexity of the task, so that the model can understand the acting force of the logic relationship between the fine-grained tasks/atomic tasks while establishing the understanding and cognition abilities of the fine-grained tasks/atomic tasks in the process of learning the solution of the complex task, the model can understand the solution process and logic of the complex task essentially, the capacity of executing the complex task is built, and the phenomenon of abrupt increase of the capacity can occur when the complex task is executed, so that the task execution capacity of the model is effectively improved.

Drawings

FIG. 1 is a flow chart of a model processing method provided in an embodiment of the present application;

FIG. 2 is a schematic diagram of a hierarchical instruction task chain in an embodiment of the present application;

FIG. 3 is a schematic illustration of a model process in an embodiment of the present application;

fig. 4 is a schematic structural diagram of a model processing device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type and not limited to the number of objects, e.g., the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

In order to solve the problem that the task execution capacity of a model cannot be effectively improved at present, considering that the capacity emergence of the model is possibly related to the insight phenomenon of the model, the embodiment of the application provides a multi-level instruction task chain for disassembling the complex task, the association between the complex task and the simple task is built, and meanwhile, the internal logic between the fine-grained tasks is built, so that the model needs to understand the logic relationship between the fine-grained tasks while focusing on the solution of the fine-grained tasks in the solution process of learning the complex task, the model essentially understands the solution process and logic of the complex task, thereby providing effective fine-grained subtasks for the solution of other complex tasks, and the solution logic of the complex task can be easily optimized through the combination of the simple fine-grained subtasks, so that the capacity emergence phenomenon of the model is rapidly generated.

The model processing method, device, electronic equipment and readable storage medium provided in the embodiments of the present application are described in detail below with reference to the accompanying drawings by means of specific embodiments and application scenarios thereof.

Referring to fig. 1, fig. 1 is a flowchart of a model processing method provided in an embodiment of the present application, where the method is applied to an electronic device, as shown in fig. 1, and the method includes the following steps:

step 11: acquiring training data;

step 12: generating first campt data and second campt data according to the training data and a pre-established campt template; the first campt data is related to a first task, and the first task at least comprises a target atomic task; the second prompt data is related to a second task, the second task comprises the target atomic task, and the number of the atomic tasks included in the second task is more than that of the atomic tasks included in the first task;

step 13: training a pre-training model by using the first prompt data to obtain an initial task execution model;

step 14: and adjusting the initial task execution model by using the second prompt data to obtain a target task execution model.

In the embodiment of the present application, the training data may be selected from documents in a document question-answer scene, peer data of answers to questions (Question and Answer, QA), etc., and may also be selected from text data in other scenes, etc., which is not limited.

The pre-trained model may be an open source large model, such as the baichuan large model, chatGLM6B, llama, and the like. The template is an input template built/created for a task (e.g., containing a single or multiple atomic tasks) that can help build the template data for training the model.

The first task and the second task may be domain-specific tasks, such as financial, medical, or communication domains. The atomic task is understood to be a task unit that implements a basic function. One complex task may include multiple atomic tasks, and the greater the number of atomic tasks included, the greater the complexity of the corresponding complex task.

Alternatively, the second task may be understood as a hierarchical instruction task chain/complex task, where each atomic task has an inherent association relationship before it.

Optionally, the atomic task may be selected from, but not limited to, any of the following: a word segmentation task, a part-of-speech tagging task, a text classification task, an entity disambiguation task, an entity extraction task, a text generation task, a reading and understanding task, a syntax analysis task and the like.

Optionally, the first task includes only a target atomic task, such as a word segmentation task or a text classification task. In this case, the model may be trained based on the first template data related to only the target atomic task, and then the model may be gradually trimmed based on the second template data related to the complex task including the target atomic task, until the model may experience a phenomenon in which the capacity increases abruptly when the complex task is performed.

According to the scheme, the task execution model can be gradually optimized/adjusted according to the complexity of the task, so that the model can understand the acting force of the logic relationship between the fine-grained tasks/atomic tasks while establishing the understanding and cognition abilities of the fine-grained tasks/atomic tasks in the process of learning the complex task, the model can construct the capacity of executing the complex task by solving the fine-grained tasks/atomic tasks, the solving process and logic of the complex task are basically understood, and the phenomenon of abrupt capacity increase can occur when the complex task is executed, so that the task execution capacity of the model is effectively improved. In addition, through the scheme in the application, the generalization capability of the complex task in the field of the model can be improved, and the difficulty in the capability emergence of the complex task is reduced.

It can be understood that through the operation of adjusting the model, the logical relationship between the atomic tasks can be injected into the model through the complex tasks, so that the model can continuously understand the acting force between the atomic tasks, thereby constructing the entity of the logical force between the atomic tasks, so that the model can be generalized to more complex tasks on the basis of the entity, and so on, the solving capability of the model to the more complex tasks is continuously expanded, and the capability of the model in the multi-level dimension is continuously promoted.

Optionally, the second prompt data is related to a plurality of second tasks, the plurality of second tasks are divided into n levels, the number of atomic tasks included in the ith level of second tasks is smaller than the number of atomic tasks included in the jth level of second tasks, and i is more than or equal to 1 and less than j and less than or equal to n; i.e. the complexity of the second task increases with increasing level of the n levels. The adjusting the initial task execution model using the second prompt data to obtain a target task execution model may include:

and adjusting the initial task execution model by using the second prompt data related to the second task of each of the n levels in turn to obtain the target task execution model. In this way, the initial task execution model can be gradually optimized/adjusted according to the complexity of the task, thereby being beneficial to the appearance of the model on the capability of the complex task.

Optionally, before the step 12, the model processing method in this embodiment may further include: firstly, preliminarily establishing a template of a prompt of an instruction task according to a hierarchy instruction task chain under a pre-established target field; and then expanding the preliminarily established template by utilizing a pre-trained language model to obtain the pre-established template. For example, the pre-trained language model is GPT4, etc. Thus, by means of expansion of the language model, the template of the template can be enriched, so that subsequent model training and adjustment are facilitated.

Optionally, to ensure correctness, after obtaining the template of the template, the template of the template may be manually checked, optimized, and verified.

Optionally, the target field includes, but is not limited to, text processing field, and the like. The hierarchical instruction task chain may be built for complex tasks in a specific scenario, such as in a document question-and-answer scenario, in combination with natural language processing (Natural Language Processing, NLP) instruction tasks.

Optionally, the NLP instruction task includes, but is not limited to: (1) Interpretation/description classes including, but not limited to, noun/term interpretation, semantic interpretation, behavioral interpretation, cause interpretation, etc.; (2) Authoring classes including, but not limited to, writing poems, writing songs, writing articles, writing comments, etc.; (3) Code classes including, but not limited to, write annotation, code generation, code completion, grammar testing, and the like; (4) Description classes including, but not limited to, task/role descriptions, state descriptions, phenomenon descriptions, fact descriptions, and the like.

Alternatively, the hierarchical instruction task chain may be as shown in fig. 2, where each branch may be understood as a hierarchical instruction task chain, and each node represents an atomic task. For example, the previous-level task of the atomic task "word segmentation" is "grammar analysis", the previous-level task is "keyword recognition", the previous-level task is "semantic understanding", the previous-level task is "punctuation", and the previous-level task is "context understanding"; for another example, the last-level task of the atomic task "name/verb division" is "lexical analysis", the last-level task is "keyword recognition", and the last-level task is "Query understanding"; etc.

Optionally, the template includes a task part, an input data part and an output result part; the process of generating the first and second template data according to the training data and the pre-established template may include: adding information of the first task to a task part in the template of the promt, adding the training data to an input data part in the template of the promt, and adding a result obtained by executing the first task on the training data to an output result part in the template of the promt to obtain the first promt data; and adding information of the second task to a task part in the template of the promt, adding the training data to an input data part in the template of the promt, and adding a result obtained by executing the second task on the training data to an output result part in the template of the promt, so as to obtain the second promt data.

Optionally, the template of the prompt further includes an output json format to normalize the output result.

For example, taking the target atomic task as a word segmentation example, several examples of the first sample data may be as follows:

1) { word segmentation is performed on the input sentence;

# input#: region a is an integral part of country B;

#output#：

{ region A is an integral part of country B }

2) { word segmentation is carried out on an input sentence, and words which are nouns are output;

# input#: region a is an integral part of country B;

outputting json format;

#output#：

results of # word segmentation:

noun #:

{ "word segmentation result": [ "region a", "is", "country B", "indivisible", "part" ], the "noun": [ "region A", "country B", "part" ] })

3) { word segmentation is carried out on an input sentence, and words which are 'virtual words' are output;

# input#: region a is an integral part of country B;

outputting json format;

#output#：

results of # word segmentation:

# article:

{ "word segmentation result": "region a", "is", "country B", "undivided", "part", "article" ], the "article": [ "Yes", "" ] })

4) { word segmentation is carried out on the input sentence, and words with a dynamic guest relation are output;

# input#: region a is an integral part of country B;

outputting json format;

#output#：

results of # word segmentation:

# predicate, object }:

{ "word segmentation result": "region a", "is", "country B", "indivisible", "part", "predicate, object": [ "Yes", "part" ] } ]

Several examples of the second sample data may be as follows:

(1) { word segmentation is carried out on the input sentence, and three most important words are selected for sentence making, wherein the number of the words is 2;

# input#: region a is an integral part of country B;

output json format: { word: [] Sentence making: [] -a };

#output#：

{ "words": [ "region A", "country B", "indivisible" ], sentence making ": [ "region A is part of country B", "country B thinks region A is indivisible" ] })

(2) { word segmentation is carried out on an input sentence, and the most important 2 words and the corresponding word numbers are output;

# input#: region a is an integral part of country B;

output json format: { words, word number };

#output#：

{ "region a", "2"; "country B", "2" }

It should be noted that the above examples regarding the first and second sample data are only for the purpose of illustrating and understanding the schemes in this application. Different first and second sample data may be generated according to specific training data, training requirements, and corresponding instruction task chains, which is not limited.

Referring to fig. 3, after generating the first and second sample data, model training may be performed based on the first sample data, wherein the tasks involved may include, but are not limited to, performing a word segmentation task on the following sentence, extracting product-like entities in the following sentence, answering questions based on the information provided by the context, indicating the presence of pronouns in the input, etc.; and performing model fine adjustment according to the second campt data, so that the finally obtained model has the understanding and executing capabilities of atomic tasks (such as word segmentation, part-of-speech tagging, text classification, syntactic analysis, text generation and the like), and simultaneously understands the logic of complex tasks such as task logic, task relevance, task hierarchy, task interaction, task reasoning and the like among the atomic tasks. After that, in the face of other complex tasks, such as "how do financial 1 and 2? What is the starting point of the later financial term? The logic of the complex task can be understood, so that the complex task can be divided into effective fine-granularity subtasks, and the solving logic of the complex task is obtained through the combination of the fine-granularity subtasks, so that a correct output result is obtained.

It should be noted that, in the model processing method provided in the embodiment of the present application, the execution subject may be a model processing device, or a control module in the model processing device for executing the model processing method. In the embodiment of the present application, a model processing device is described by taking a model processing method performed by the model processing device as an example.

Referring to fig. 4, fig. 4 is a schematic structural diagram of a model processing apparatus according to an embodiment of the present application, where the apparatus is applied to an electronic device, and as shown in fig. 4, a model processing apparatus 40 includes:

an acquisition module 41 for training data;

a generating module 42, configured to generate first sample data and second sample data according to the training data and a pre-established sample template; wherein the first campt data is related to a first task, and the first task at least comprises a target atomic task; the second prompt data is related to a second task, the second task comprises the target atomic task, and the number of the atomic tasks included in the second task is more than the number of the atomic tasks included in the first task;

the training module 43 is configured to train the pre-training model by using the first sample data to obtain an initial task execution model;

and the adjustment module 44 is configured to adjust the initial task execution model by using the second sample data, so as to obtain a target task execution model.

Optionally, the second prompt data is related to a plurality of second tasks, the plurality of second tasks are divided into n levels, the number of atomic tasks included in the ith level of second tasks is smaller than the number of atomic tasks included in the jth level of second tasks, and i is more than or equal to 1 and less than j and less than or equal to n;

the adjustment module 44 is specifically configured to: and adjusting the initial task execution model by using the second prompt data related to the second task of each of the n levels in turn to obtain the target task execution model.

Optionally, the template includes a task part, an input data part and an output result part; the generating module 42 is specifically configured to: adding information of the first task to a task part in the template of the promt, adding the training data to an input data part in the template of the promt, and adding a result obtained by executing the first task on the training data to an output result part in the template of the promt to obtain the first promt data; the method comprises the steps of adding information of the second task to a task part in the template of the promt, adding the training data to an input data part in the template of the promt, and adding a result obtained by executing the second task on the training data to an output result part in the template of the promt to obtain the second promt data.

Optionally, the model processing device 40 further comprises

Before the first and second template data are generated according to the pre-established template, the method further includes:

the building module is used for preliminarily building a template of the instruction task according to a hierarchical instruction task chain in a pre-built target field before generating first template data and second template data according to the training data and the pre-built template;

and the expansion module is used for expanding the preliminarily established template by utilizing the pre-trained language model to obtain the pre-established template.

Optionally, the first task includes only the target atomic task.

Optionally, the atomic task is any one of the following:

a word segmentation task, a part-of-speech tagging task, a text classification task, an entity disambiguation task, an entity extraction task, a text generation task, a reading and understanding task and a syntax analysis task.

The model processing device 40 of the embodiment of the present application may implement each process of the method embodiment shown in fig. 1 and achieve the same technical effects, and in order to avoid repetition, the description is omitted here.

Optionally, as shown in fig. 5, the embodiment of the present application further provides an electronic device 50, including a processor 51, a memory 52, and a program or an instruction stored in the memory 52 and capable of running on the processor 51, where the program or the instruction implements each process of the embodiment of the model processing method when executed by the processor 51, and the process can achieve the same technical effect, and for avoiding repetition, a description is omitted herein.

The embodiment of the application also provides a readable storage medium, on which a program or an instruction is stored, where the program or the instruction can implement each process of the embodiment of the model processing method and achieve the same technical effect when executed by a processor, and in order to avoid repetition, a detailed description is omitted here.

Computer-readable media include both permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRA 4), static random access memory (SRA 4), dynamic random access memory (DRA 4), other types of random access memory (RA 4), read only memory (RO 4), electrically erasable programmable read only memory (EEPRO 4), flash memory or other memory technology, compact disc read only memory (CD-RO 4), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium which can be used to store information that can be accessed by the computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission 4 edia), such as modulated data signals and carrier waves.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The foregoing embodiment numbers of the present application are merely for describing, and do not represent advantages or disadvantages of the embodiments.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. RO4/RA4, magnetic disk, optical disk), comprising several instructions for causing a service classification device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the method described in the embodiments of the present application.

The foregoing is merely a preferred embodiment of the present application and it should be noted that modifications and adaptations to those skilled in the art may be made without departing from the principles of the present application and are intended to be comprehended within the scope of the present application.

Claims

1. A model processing method applied to an electronic device, comprising:

acquiring training data;

2. The method of claim 1, wherein the second prompt data is associated with a plurality of second tasks, the plurality of second tasks being divided into n levels, the i-th level of second tasks comprising a fewer number of atomic tasks than the j-th level of second tasks, 1.ltoreq.i < j.ltoreq.n;

the step of adjusting the initial task execution model by using the second prompt data to obtain a target task execution model includes:

and adjusting the initial task execution model by using the second prompt data related to the second task of each of the n levels in turn to obtain the target task execution model.

3. The method of claim 1, wherein the template of sample comprises a task portion, an input data portion, and an output result portion;

the generating the first campt data and the second campt data according to the training data and the pre-established campt template comprises the following steps:

adding information of the first task to a task part in the template of the promt, adding the training data to an input data part in the template of the promt, and adding a result obtained by executing the first task on the training data to an output result part in the template of the promt to obtain the first promt data;

the method comprises the steps of adding information of the second task to a task part in the template of the promt, adding the training data to an input data part in the template of the promt, and adding a result obtained by executing the second task on the training data to an output result part in the template of the promt to obtain the second promt data.

4. A method according to any one of claims 1 to 3, wherein before generating the first and second sample data from the training data and a pre-established sample template, the method further comprises:

according to a hierarchical instruction task chain in a pre-established target field, initially establishing a template of a prompt of an instruction task;

and expanding the preliminarily established template by utilizing a pre-trained language model to obtain the pre-established template.

5. The method of claim 1, wherein the first task comprises only the target atomic task.

6. The method of claim 1, wherein the atomic task is any one of:

7. A model processing apparatus, comprising:

the acquisition module is used for training data;

8. The apparatus of claim 7, wherein the second prompt data is associated with a plurality of second tasks, the plurality of second tasks being partitioned into n levels, a number of atomic tasks included in a level i second task being less than a number of atomic tasks included in a level j second task, 1.ltoreq.i < j.ltoreq.n;

the adjusting module is specifically used for: and adjusting the initial task execution model by using the second prompt data related to the second task of each of the n levels in turn to obtain the target task execution model.

9. An electronic device comprising a processor, a memory and a program or instruction stored on the memory and executable on the processor, which when executed by the processor implements the steps of the model processing method of any one of claims 1 to 6.

10. A readable storage medium, characterized in that the readable storage medium has stored thereon a program or instructions which, when executed by a processor, implement the steps of the model processing method according to any of claims 1 to 6.