CN115129826B - Electric power field model pre-training method, fine tuning method, device and equipment - Google Patents

Electric power field model pre-training method, fine tuning method, device and equipment Download PDF

Info

Publication number
CN115129826B
CN115129826B CN202211060951.9A CN202211060951A CN115129826B CN 115129826 B CN115129826 B CN 115129826B CN 202211060951 A CN202211060951 A CN 202211060951A CN 115129826 B CN115129826 B CN 115129826B
Authority
CN
China
Prior art keywords
training
model
layer
electric power
size
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202211060951.9A
Other languages
Chinese (zh)
Other versions
CN115129826A (en
Inventor
宋博川
张强
周飞
刘同阳
范晓宣
贾全烨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Smart Grid Research Institute Co ltd
State Grid Corp of China SGCC
Original Assignee
State Grid Smart Grid Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Smart Grid Research Institute Co ltd filed Critical State Grid Smart Grid Research Institute Co ltd
Priority to CN202211060951.9A priority Critical patent/CN115129826B/en
Publication of CN115129826A publication Critical patent/CN115129826A/en
Application granted granted Critical
Publication of CN115129826B publication Critical patent/CN115129826B/en
Priority to PCT/CN2023/115522 priority patent/WO2024046316A1/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a pre-training method, a fine tuning method, a device and equipment of a model in the power field, wherein the pre-training method comprises the following steps: acquiring original electric power corpus data; processing the original electric power corpus data, wherein the processing at least comprises word segmentation processing; constructing a pre-training corpus of the electric power field model by adopting a full-word masking method for the electric power corpus data obtained after processing; constructing an electric power field model, wherein the electric power field model comprises an attention matrix, and the attention matrix introduces word-to-word relative position codes; and pre-training the electric power field model by utilizing the pre-training corpus. The technical scheme provided by the invention can improve the migration capability of the pre-training model.

Description

Electric power field model pre-training method, fine tuning method, device and equipment
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a pre-training method, a fine-tuning device and an apparatus for a model in the power field.
Background
Existing Natural Language Processing (NLP) models can contain millions of parameters. Therefore, training out an NLP model with good performance requires a large number of training samples and label data. Typically, training samples are labeled manually. Therefore, a large amount of tag data is acquired, which requires high labor cost.
In this context, the pre-training plus fine tuning mode is widely applied to NLP model training. A pre-training model is first trained using relatively low-cost and readily available training data. In this way, the pre-trained model can learn general knowledge of linguistics. Therefore, for different downstream tasks, the related parameters can be finely adjusted by using the related label data, so that the trained NLP model has good performance.
However, in the pre-training stage of the natural language processing model, since the model is not trained for the downstream task but trained for the task (for example, word to predict occlusion) in the pre-training stage, the migration capability of the pre-trained model is weak, that is, when the model for the downstream task is obtained by fine tuning the pre-trained model, the adaptability of the model is poor, and the prediction accuracy is low.
Disclosure of Invention
In view of this, embodiments of the present invention provide a pre-training method, a fine-tuning method, an apparatus, and a device for a model in an electric power domain, so as to solve a problem that a pre-training model processed by a natural language is weak in migration capability.
According to a first aspect, an embodiment of the present invention provides a power domain model pre-training method, where the method includes:
acquiring original electric power corpus data;
processing the original electric power corpus data, wherein the processing at least comprises word segmentation processing;
constructing a pre-training corpus of the electric power field model by adopting a full-word masking method for the electric power corpus data obtained after processing;
constructing an electric power field model, wherein the electric power field model comprises an attention matrix, and the attention matrix introduces word-to-word relative position codes;
and pre-training the electric power field model by utilizing the pre-training corpus.
Optionally, the algorithm formula of the attention matrix introduced with the relative position code between words is:
Attention_rel(QKV)= Attention (QKV)+rel
wherein the content of the first and second substances,Attention (QKV) For not introducing said relative positionThe algorithmic formulation of the encoded attention matrix,relis a parameter that relates to the relative position of words to one another.
Optionally, the processing the original electric power corpus data includes:
and performing word segmentation processing on the original power corpus data by adopting a BERT-CRF model and a power field dictionary, wherein the BERT-CRF model is obtained by training power word segmentation corpus.
Optionally, the constructing the pre-training corpus of the electric power field model by using a full-word masking method on the electric power corpus data obtained after the processing includes:
and carrying out random whole-word masking on the electric power corpus data obtained after the processing by adopting a preset probability, replacing one part of characters corresponding to all words to be masked with random characters, replacing the other part of the characters with masking symbols, and keeping the rest part of the characters unchanged.
According to a second aspect, an embodiment of the present invention provides a fine tuning method for a power domain model, including:
constructing a training data set aiming at a downstream task;
the method comprises the steps that other network structures except an output layer in a pre-training model in the power field are used as a bottom layer encoder, an output layer network structure is built according to a downstream task, the output layer network structure is connected to the bottom layer encoder, and then the power field model for the downstream task is obtained, pre-training linguistic data of the pre-training model in the power field are obtained by carrying out word segmentation processing on original power linguistic data and then adopting full word masking, the pre-training model in the power field comprises an attention matrix, and relative position coding between words is introduced into the attention matrix;
and training the power domain model aiming at the downstream task by utilizing the training data set.
Optionally, the downstream task is a classification task, and the output layer network structure is a full-connection network; and a first network structure is also included between the bottom layer encoder and the fully connected network;
the first network structure is used for extracting coding vectors of a first layer and a last layer in the bottom layer coder, averaging the coding vectors to obtain a first coding vector, and averaging the first coding vectors of all words to obtain the coding vector of the bottom layer coder;
the fully-connected network is used for outputting the confidence corresponding to each category based on the coding vector of the bottom layer coder.
Optionally, the downstream task is a sequence tagging task, the output layer network structure is a conditional random field, and a Dropout layer and a mapping layer are further included between the bottom layer encoder and the conditional random field layer;
the output of the bottom layer encoder is a (batch _ size, time _ steps, hidden _ size) shaped tensor, wherein the batch _ size is a batch size, the time _ steps is a sequence length, and the hidden _ size is a hidden layer unit size of the bottom layer encoder;
the output of the bottom layer encoder is converted into a tensor of (batch _ size, time _ steps, num _ classes) shape through the Dropout layer and the mapping layer, wherein num _ classes is the number of the target classes;
the conditional random field layer is used to derive a label for each element in the entire sequence based on the tensor of the (batch _ size, time _ steps, num _ classes) shape.
According to a third aspect, an embodiment of the present invention provides an apparatus for pre-training a model in an electric power domain, including:
the acquisition module is used for acquiring original electric power corpus data;
the processing module is used for processing the original electric power corpus data, and the processing at least comprises word segmentation processing;
the first construction module is used for constructing the pre-training corpus of the electric power field model by adopting a full-word masking method for the electric power corpus data obtained after processing;
the second construction module is used for constructing an electric power field model, the electric power field model comprises an attention matrix, and the attention moment matrix introduces relative position codes among words;
and the pre-training module is used for pre-training the electric power field model by utilizing the pre-training corpus.
According to a fourth aspect, an embodiment of the present invention provides a fine tuning apparatus for a power domain model, including:
the third construction module is used for constructing a training data set aiming at the downstream task;
the fourth construction module is used for taking other network structures except an output layer in the electric power field pre-training model as a bottom layer encoder, constructing an output layer network structure according to the downstream task, connecting the output layer network structure to the bottom layer encoder, and then obtaining an electric power field model aiming at the downstream task, wherein the pre-training corpus of the electric power field pre-training model is obtained by carrying out word segmentation processing on original electric power corpus data and then adopting full word masking, the electric power field pre-training model comprises an attention matrix, and the attention matrix introduces relative position codes between words;
and the training module is used for training the power domain model aiming at the downstream task by utilizing the data set for training.
According to a fifth aspect, an embodiment of the present invention provides an electronic device, including:
a memory and a processor, the memory and the processor being communicatively connected to each other, the memory being configured to store a computer program, and the computer program, when executed by the processor, implementing any one of the pre-training methods for a power domain model according to the first aspect or implementing the fine-tuning method for any one of the power domain models according to the second aspect.
According to a sixth aspect, an embodiment of the present invention provides a computer-readable storage medium for storing a computer program, which when executed by a processor, implements any of the above-mentioned pre-training methods for a power domain model of the first aspect, or implements any of the above-mentioned fine tuning methods for a power domain model of the second aspect.
In the embodiment of the invention, the pre-training corpus of the electric power field model is constructed in a full-word shielding mode, so that the problem that when the pre-training corpus of the electric power field model is constructed in a character shielding mode, the model can easily guess shielded words and neglect semantic information between the words and the whole sentence is solved, and the migration capability of the pre-training model can be improved. In addition, the embodiment of the invention also introduces the relative position modeling between words into the constructed pre-training model, namely the electric power field model, and particularly increases the attention matrix introducing the relative position coding between words, so that the model can pay more attention to the relative position between words, and is more sensitive to the relative position between words, and the pre-training electric power field model is not only suitable for the masking word prediction task in the pre-training stage, but also is easier to migrate to the downstream task.
Drawings
The features and advantages of the present invention will be more clearly understood by reference to the accompanying drawings, which are schematic and are not to be understood as limiting the invention in any way, and in which:
fig. 1 is a schematic flow chart of a pre-training method for a model in an electric power domain according to an embodiment of the present invention;
fig. 2 is a schematic diagram illustrating a process of processing the raw power corpus data according to an embodiment of the present invention;
fig. 3 is a schematic flow chart of a method for fine tuning a model in an electric power domain according to an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electric power domain model pre-training apparatus according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a fine tuning apparatus of a power domain model according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be obtained by a person skilled in the art without inventive step based on the embodiments of the present invention, are within the scope of protection of the present invention.
It is to be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of other like elements in a process, method, article, or apparatus comprising the element. Furthermore, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. In the description of the following examples, "plurality" means two or more unless specifically limited otherwise.
Referring to fig. 1, an embodiment of the present invention provides a method for pre-training a model in an electric power domain, where the method includes:
s101: acquiring original electric power corpus data;
s102: processing the original electric power corpus data, wherein the processing at least comprises word segmentation processing;
s103: constructing a pre-training corpus of the electric power field model by adopting a full-word masking method for the electric power corpus data obtained after processing;
s104: constructing an electric power field model, wherein the electric power field model comprises an attention matrix, and the attention matrix introduces word-to-word relative position codes;
s105: and pre-training the electric power field model by utilizing the pre-training corpus.
Specifically, the power domain model may be a power domain large model, i.e., a power domain large-scale model. The original electric power corpus data can be a large amount of electric power data, the processing can also comprise cleaning, the cleaning processing can be realized by specifically adopting tool bags such as regular matching and Beautiful Soup before word segmentation operation, and the cleaning processing is used for filtering out some special symbols in the original electric power corpus data, including messy codes, html symbols and the like, so as to obtain cleaner corpus data.
When the electric power field model is trained, the electric power field model is used for predicting the words which are shielded in the pre-training corpus constructed by the full-word shielding method, the prediction result is compared with the words before being shielded, and the parameters of the electric power field model are adjusted according to the comparison result.
In the embodiment of the invention, the pre-training corpus of the electric power field model is constructed in a full-word shielding mode, so that the problem that when the pre-training corpus of the electric power field model is constructed in a character shielding mode, the model can easily guess shielded words and neglect semantic information between the words and the whole sentence is solved, and the migration capability of the pre-training model can be improved. In addition, the embodiment of the invention also introduces the relative position modeling between words into the constructed pre-training model, namely the electric power field model, and particularly increases the attention matrix introducing the relative position coding between words, so that the model can pay more attention to the relative position between words, and is more sensitive to the relative position between words, and the pre-training electric power field model is not only suitable for the masking word prediction task in the pre-training stage, but also is easier to migrate to the downstream task.
In some specific embodiments, the algorithm formula of the attention matrix introduced with the relative position code between words is:
Attention_rel(QKV)= Attention (QKV)+rel
wherein the content of the first and second substances,Attention (QKV) An algorithmic formula for an attention matrix without the introduction of said relative position codes, calculated for one noteAttention matrix of the Attention head.relIs a parameter related to the relative position between words,relfor each input sample (sample, i.e., a piece of pre-training corpus), there is a scalar corresponding to a head of attention.
In particular, the method comprises the following steps of,
Figure 8480DEST_PATH_IMAGE002
q, K and V respectively represent Query, key and Value, V is a vector representing input features, and Q and K are feature vectors for calculating Attention weight. Both derived from the input features. Attention (Q, K, V) is the multiplication of V by a corresponding weight according to the degree of interest. Q, K and V in the Attention mechanism are that similarity is calculated for the current Query and all keys, a group of weights are obtained by the similarity Value through a Softmax layer, and Value values under the Attention are obtained by summing the products of the group of weights and corresponding Value. Q, K and V are the passing of the input vector X through the matrix W Q 、W K 、W V Is converted to obtain Q 、W K 、W V Are three trainable parameter matrices.d k Is the dimension of K.
In the embodiment of the invention, the relative position coding adopts a coding mode of T5, and the position offset is introduced into the attention matrix. I.e. adding a relative position bias on the basis of the attention matrixrel
In some specific embodiments, the processing the original power corpus data includes:
and performing word segmentation processing on the original power corpus data by adopting a BERT-CRF model and a power field dictionary, wherein the BERT-CRF model is obtained by training power word segmentation corpus.
The BERT-CRF model obtained by training through the electric power word segmentation corpus is a word segmentation tool in the electric power field. The BERT model is a common pre-training language model in the field of natural language processing, and is called as Bidirectional Encoder reproduction from transformations, CRF: conditional random fields, are a traditional machine learning method. The BERT-CRF model adopts a BMES coding mode, wherein B represents that the current character is the beginning character of a multi-character word, M represents that the current character is the middle character of the multi-character word, E represents that the current character is the ending character of the multi-character word, and S represents that the current character is a single word. For example, "overhaul specification of transformer" is labeled to obtain "B, M, E, S, B, E", and the corresponding word segmentation result is: "transformer/overhaul/norm". The electric power domain dictionary is also the electric power dictionary. In the embodiment of the invention, the BERT-CRF model is used for performing word segmentation on the original electric power corpus data, and then the electric power dictionary is used for combining the segmented electric power words to obtain a final word segmentation result. The raw power corpus data for which the word segmentation process is directed here may be power corpus data that has been subjected to a cleansing process. Referring to fig. 2, the word segmentation process results in a word sequence composed of a series of words.
In the embodiment of the invention, the BERT-CRF model obtained by training the electric power participle corpus and the electric power field dictionary are adopted to carry out participle processing on the original electric power corpus data, so that an entity in the electric power field can be segmented as a whole, and electric power special nouns are furthest prevented from being separated.
In other optional specific embodiments, other electric power domain word segmentation tools may be further adopted, and the electric power domain dictionary is combined to perform word segmentation processing on the original electric power corpus data.
In the traditional model pre-training stage, a non-full word masking character masking method is adopted, which may cause the problem of partial word masking in the processing process. For example, a "servicing specification of a transformer" may become, subject to character masking: "change" "MASK" "device" "detection" "modification" "gauge" "and norm". Wherein the 'voltage' word of the 'transformer' is shielded separately. This may make the model more focused on local word information. In the above example, the model can guess the "press" word from the "variant" and "device" words, and thus ignores the semantic information between the word and the whole sentence. The whole word masking can mask the whole power noun, and the above example becomes: the "" detection "" modification "" specification "" and "" norm "" of "[ MASK ]" "[ MASK ]" "[ MASK ]" "" "" "" "" are "modified" "normal" ". In order to predict the shielded power noun 'transformer', the model needs to mine the semantic information of the shielded words from the whole sentence, so that the model establishes the semantic relation between the power noun and the whole sentence.
In some embodiments of the present invention, the constructing the pre-training corpus of the electric power domain model by using a full-word masking method on the electric power corpus data obtained after the processing includes:
and carrying out random whole-word masking on the electric power corpus data obtained after the processing by adopting a preset probability, replacing one part of characters corresponding to all words to be masked with random characters, replacing the other part of the characters with masking symbols, and keeping the rest part of the characters unchanged.
For example, random whole-word masking may be performed on a word sequence obtained after word segmentation processing with a probability of 0.15, and characters corresponding to all words to be masked are processed according to the following method: the processing is performed in a manner such that 10% is replaced with a random character, 80% is replaced with a masking symbol (e.g., MASK) as described above, and 10% remains the original character.
In addition, in the embodiment of the present invention, the electric power domain model may be constructed based on a BERT model, so that, in order to maintain consistency of model training, when the pre-training corpus of the electric power domain model is constructed by using a full-word masking method, a special symbol [ CLS ] is added to a beginning of each sentence, and a special symbol [ SEP ] is added to an end of each sentence, for each sentence that has been subjected to the full-word masking processing.
Referring to fig. 3, an embodiment of the present invention further provides a method for fine tuning a power domain model, including:
s301: constructing a training data set aiming at a downstream task;
s302: the method comprises the steps that other network structures (namely coding layers of a pre-training model in the power field) except an output layer in the pre-training model in the power field are used as bottom layer encoders, an output layer network structure is built according to downstream tasks, the output layer network structure is connected to the bottom layer encoders to obtain the power field model aiming at the downstream tasks, pre-training corpora of the pre-training model in the power field are obtained by carrying out word segmentation on original power corpus data and then adopting full-word masking, the pre-training model in the power field comprises an attention matrix, and relative position codes between words are introduced into the attention matrix;
s303: and training the power domain model aiming at the downstream task by utilizing the training data set.
Specifically, the electric power domain pre-training model may be obtained by pre-training using any one of the electric power domain model pre-training methods described in the above embodiments.
In the embodiment of the invention, the pre-training corpus of the electric power field model is constructed in a full-word shielding mode, so that the problem that when the pre-training corpus of the electric power field model is constructed in a character shielding mode, the model can easily guess shielded words and neglect semantic information between the words and the whole sentence is solved, and the migration capability of the pre-training model can be improved. In addition, the embodiment of the invention also introduces the relative position modeling between words into the constructed pre-training model, namely the electric power field model, and particularly increases the attention matrix introducing the relative position coding between words, so that the model can pay more attention to the relative position between words, and is more sensitive to the relative position between words, and the pre-training electric power field model is not only suitable for the masking word prediction task in the pre-training stage, but also is easier to migrate to the downstream task.
In the embodiment of the invention, in the fine tuning stage of the model in the electric power field, different output layer network structures need to be designed according to different downstream tasks. The following is an example of a common task among natural language processing tasks.
In some specific embodiments, the downstream task is a classification task, and the output layer network structure is a fully-connected network; and a first network structure is also included between the bottom layer encoder and the fully connected network;
the first network structure is used for extracting coding vectors of a first layer and a last layer in the bottom layer coder, averaging the coding vectors to obtain a first coding vector, and averaging the first coding vectors of all words to obtain the coding vector of the bottom layer coder;
the fully-connected network is used for outputting the confidence corresponding to each category based on the coding vector of the bottom layer coder.
In other specific embodiments, the downstream task is a sequence tagging task, the output-layer network structure is a Conditional Random Field (CRF), and a Dropout layer and a mapping layer are further included between the bottom-layer encoder and the conditional random field layer;
the output of the bottom layer encoder is a (batch _ size, time _ steps, hidden _ size) shaped tensor, wherein the batch _ size is a batch size, the time _ steps is a sequence length, and the hidden _ size is a hidden layer unit size of the bottom layer encoder;
the output of the bottom layer encoder is converted into a (batch _ size, time _ steps, num _ classes) shaped tensor through the Dropout layer and the mapping layer, wherein num _ classes is the number of the target classes;
the conditional random field layer is used to derive a label for each element in the entire sequence based on the tensor of the (batch _ size, time _ steps, num _ classes) shape. The whole sequence refers to a sequence to be labeled, which is input to the power field model aiming at the sequence labeling task.
And the conditional random field is used as a labeling structure of the sequence labeling task. The Dropout layer is used to zero out the elements in the (batch _ size, time _ steps, hidden _ size) shaped tensor output by the underlying encoder with a certain probability, which can increase the robustness of the model. The tensor passing through Dropout is converted into a (batch _ size, time _ steps, num _ classes) shaped tensor by the mapping layer.
Accordingly, referring to fig. 4, an embodiment of the present invention provides a pre-training apparatus for a model in an electric power domain, including:
an obtaining module 401, configured to obtain original power corpus data;
a processing module 402, configured to process the original power corpus data, where the processing at least includes word segmentation processing;
a first building module 403, configured to build a pre-training corpus of the power domain model by using a full-word masking method for the processed power corpus data;
a second building module 404, configured to build an electric power domain model, where the electric power domain model includes an attention matrix, and the attention matrix introduces word-to-word relative position codes;
and a pre-training module 405, configured to pre-train the power domain model by using the pre-training corpus.
In the embodiment of the invention, the pre-training corpus of the electric power field model is constructed in a full-word shielding mode, so that the problem that when the pre-training corpus of the electric power field model is constructed in a character shielding mode, the model can easily guess shielded words and neglect semantic information between the words and the whole sentence is solved, and the migration capability of the pre-training model can be improved. In addition, the embodiment of the invention also introduces the relative position modeling between words into the constructed pre-training model, namely the electric power field model, and particularly increases the attention matrix introducing the relative position coding between words, so that the model can pay more attention to the relative position between words, and is more sensitive to the relative position between words, and the pre-training electric power field model is not only suitable for the masking word prediction task in the pre-training stage, but also is easier to migrate to the downstream task.
In some specific embodiments, the algorithm formula of the attention matrix introduced with the relative position code between words is:
Attention_rel(QKV)= Attention (QKV)+rel
wherein the content of the first and second substances,Attention (QKV) For an algorithmic formulation of the attention matrix without introducing the relative position code,relis a parameter that relates to the relative position of words to one another.
In some specific embodiments, the processing module 402 is configured to perform word segmentation on the raw power corpus data by using a BERT-CRF model and a power domain dictionary, where the BERT-CRF model is obtained by training using a power word segmentation corpus.
In some specific embodiments, the first building module 403 includes:
and the shielding unit is used for carrying out random whole-word shielding on the electric power corpus data obtained after the processing by adopting a preset probability, replacing one part of characters corresponding to all the words to be shielded with random characters, replacing the other part of the characters with shielding symbols, and keeping the rest part of the characters unchanged.
The embodiment of the present invention is an embodiment of an apparatus based on the same inventive concept as the embodiment of the pre-training method for the model in the power domain, and therefore, for specific technical details and corresponding technical effects, please refer to the embodiment of the pre-training method for the model in the power domain, which is not described herein again.
Accordingly, referring to fig. 5, an embodiment of the present invention provides a fine tuning apparatus for an electric power domain model, including:
a third constructing module 501, configured to construct a training data set for a downstream task;
a fourth constructing module 502, configured to use other network structures in the pre-training model in the power field as bottom-layer encoders, construct an output-layer network structure according to the downstream task, connect the output-layer network structure to the bottom-layer encoders, and then obtain a power field model for the downstream task, where pre-training corpora of the pre-training model in the power field are obtained by performing word segmentation processing on original power corpus data and then masking with full words, the pre-training model in the power field includes an attention matrix, and the attention matrix introduces a relative position code between words;
a training module 503, configured to train the power domain model for the downstream task by using the training dataset.
In the embodiment of the invention, the pre-training corpus of the electric power field model is constructed in a full-word shielding mode, so that the problem that when the pre-training corpus of the electric power field model is constructed in a character shielding mode, the model can easily guess shielded words and neglect semantic information between the words and the whole sentence is solved, and the migration capability of the pre-training model can be improved. In addition, the embodiment of the invention also introduces the relative position modeling between words into the constructed pre-training model, namely the electric power field model, and particularly increases the attention matrix introducing the relative position coding between words, so that the model can pay more attention to the relative position between words, and is more sensitive to the relative position between words, and the pre-training electric power field model is not only suitable for the masking word prediction task in the pre-training stage, but also is easier to migrate to the downstream task.
In some specific embodiments, the downstream task is a classification task, and the output layer network structure is a fully-connected network; a first network structure is also arranged between the bottom layer encoder and the full-connection network;
the first network structure is used for extracting coding vectors of a first layer and a last layer in the bottom layer coder and averaging to obtain a first coding vector, and then averaging the first coding vectors of all words to obtain the coding vector of the bottom layer coder;
the fully-connected network is used for outputting the confidence corresponding to each category based on the coding vector of the bottom layer coder.
In some specific embodiments, the downstream task is a sequence labeling task, the output-layer network structure is a conditional random field, and a Dropout layer and a mapping layer are further included between the bottom-layer encoder and the conditional random field layer;
the output of the bottom layer encoder is a (batch _ size, time _ steps, hidden _ size) shaped tensor, wherein the batch _ size is a batch size, the time _ steps is a sequence length, and the hidden _ size is a hidden layer unit size of the bottom layer encoder;
the output of the bottom layer encoder is converted into a (batch _ size, time _ steps, num _ classes) shaped tensor through the Dropout layer and the mapping layer, wherein num _ classes is the number of the target classes;
the conditional random field layer is used to derive a label for each element in the entire sequence based on the tensor of the (batch _ size, time _ steps, num _ classes) shape.
The embodiment of the present invention is a device embodiment based on the same inventive concept as the embodiment of the fine tuning method for the model in the power field, and therefore specific technical details and corresponding technical effects are referred to the embodiment of the fine tuning method for the model in the power field, and are not described herein again.
An embodiment of the present invention further provides an electronic device, as shown in fig. 6, the electronic device may include a processor 61 and a memory 62, where the processor 61 and the memory 62 may be communicatively connected to each other through a bus or in another manner, and fig. 6 illustrates an example of a connection through a bus.
The processor 61 may be a Central Processing Unit (CPU). The Processor 61 may also be other general purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, or combinations thereof.
The memory 62, as a non-transitory computer readable storage medium, may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules corresponding to the pre-training method of the power domain model in the embodiment of the present invention (e.g., the obtaining module 401, the processing module 402, the first building module 403, the second building module 404, and the pre-training module 405 shown in fig. 4) or program instructions/modules corresponding to the fine-tuning method of the power domain model in the embodiment of the present invention (e.g., the third building module 501, the fourth building module 502, and the training module 503 shown in fig. 5). The processor 61 executes various functional applications and data processing of the processor by running non-transitory software programs, instructions and modules stored in the memory 62, that is, implements the power domain model pre-training method or the fine tuning method of the power domain model in the above method embodiments.
The memory 62 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created by the processor 61, and the like. Further, the memory 62 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 62 may optionally include memory located remotely from the processor 61, and these remote memories may be connected to the processor 61 via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The one or more modules are stored in the memory 62 and when executed by the processor 61, perform the power domain model pre-training method or the fine tuning method of the power domain model in the above-described method embodiments.
The specific details of the electronic device may be understood by referring to the corresponding related description and effects in the foregoing method embodiments, which are not described herein again.
Accordingly, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium is used to store a computer program, and when the computer program is executed by a processor, the computer program implements each process of the foregoing electric power field model pre-training method embodiment or implements each process of the foregoing electric power field model fine tuning method embodiment, and can achieve the same technical effect, and in order to avoid repetition, the computer program is not described herein again.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims (7)

1. A fine tuning method of a power domain model is characterized by comprising the following steps:
constructing a training data set aiming at a downstream task;
taking other network structures except an output layer in a pre-training model in the power field as a bottom-layer encoder, constructing an output layer network structure according to the downstream task, connecting the output layer network structure to the bottom-layer encoder, and then obtaining a power field model for the downstream task, wherein pre-training linguistic data of the pre-training model in the power field is obtained by carrying out word segmentation on original power linguistic data and then adopting full-word masking, the pre-training model in the power field comprises an attention matrix, and the attention matrix introduces relative position codes between words;
training the power domain model for the downstream task by using the training data set;
wherein the algorithm formula of the attention matrix introduced with the relative position codes among the words is as follows:
Attention_rel(QKV)= Attention (QKV)+rel
wherein the content of the first and second substances,Attention (QKV) For the algorithmic formula of the Attention matrix without the introduction of the relative position code, V is the vector of the input features, Q, K are the feature vectors for the computation of the Attention weights,relis a parameter that is related to the relative position between words;
when the downstream task is a classification task, the output layer network structure is a full-connection network; a first network structure is also arranged between the bottom layer encoder and the full-connection network;
the first network structure is used for extracting coding vectors of a first layer and a last layer in the bottom layer coder and averaging to obtain a first coding vector, and then averaging the first coding vectors of all words to obtain the coding vector of the bottom layer coder;
the fully-connected network is used for outputting a confidence coefficient corresponding to each category based on the coding vector of the bottom layer coder;
when the downstream task is a sequence labeling task, the output layer network structure is a conditional random field, and a Dropout layer and a mapping layer are also arranged between the bottom encoder and the conditional random field layer;
the output of the bottom layer encoder is a tensor in the shape of batch _ size, time _ steps and depth _ size, wherein the batch _ size is the size of a batch, the time _ steps is the sequence length, and the depth _ size is the size of a hidden layer unit of the bottom layer encoder;
the output of the bottom layer encoder is converted into tensors of shapes of batch _ size, time _ steps and num _ classes through the Dropout layer and the mapping layer, wherein num _ classes are the number of the target classes;
the conditional random field layer is used to derive a label for each element in the entire sequence based on the tensor of the batch _ size, time _ steps, num _ classes shape.
2. The method of claim 1, wherein the electric power domain pre-training model is obtained by:
acquiring original electric power corpus data;
processing the original electric power corpus data, wherein the processing at least comprises word segmentation processing;
constructing a pre-training corpus of the electric power field model by adopting a full-word masking method for the electric power corpus data obtained after processing;
constructing a power field model;
and pre-training the electric power field model by utilizing the pre-training corpus.
3. The method according to claim 2, wherein the processing the raw power corpus data comprises:
and performing word segmentation processing on the original power corpus data by adopting a BERT-CRF model and a power field dictionary, wherein the BERT-CRF model is obtained by training through power word segmentation corpuses.
4. The method according to claim 2, wherein the step of constructing the pre-training corpus of the electric power domain model by using a full-word masking method for the processed electric power corpus data comprises:
and carrying out random whole-word masking on the electric power corpus data obtained after the processing by adopting a preset probability, replacing one part of characters corresponding to all words needing masking with random characters, replacing the other part of characters with masking symbols, and keeping the rest part of characters unchanged.
5. A fine tuning device of a power domain model is characterized by comprising:
the third construction module is used for constructing a training data set aiming at the downstream task;
the fourth construction module is used for taking other network structures except an output layer in the electric power field pre-training model as a bottom-layer encoder, constructing an output layer network structure according to the downstream task, connecting the output layer network structure to the bottom-layer encoder, and then obtaining an electric power field model aiming at the downstream task, wherein the pre-training corpus of the electric power field pre-training model is obtained by carrying out word segmentation processing on original electric power corpus data and then adopting full-word masking, the electric power field pre-training model comprises an attention matrix, and the attention matrix introduces relative position codes between words;
the training module is used for training the power domain model aiming at the downstream task by utilizing the data set for training;
wherein the algorithm formula of the attention matrix introduced with the relative position codes among the words is as follows:
Attention_rel(QKV)= Attention (QKV)+rel
wherein the content of the first and second substances,Attention (QKV) For the algorithmic formula of the Attention matrix without the introduction of the relative position code, V is the vector of the input features, Q, K are the feature vectors for the computation of the Attention weights,relis a parameter related to the relative position between words;
when the downstream task is a classification task, the output layer network structure is a full-connection network; a first network structure is also arranged between the bottom layer encoder and the full-connection network;
the first network structure is used for extracting coding vectors of a first layer and a last layer in the bottom layer coder and averaging to obtain a first coding vector, and then averaging the first coding vectors of all words to obtain the coding vector of the bottom layer coder;
the fully-connected network is used for outputting a confidence coefficient corresponding to each category based on the coding vector of the bottom layer coder;
when the downstream task is a sequence labeling task, the output layer network structure is a conditional random field, and a Dropout layer and a mapping layer are also arranged between the bottom encoder and the conditional random field layer;
the output of the bottom layer encoder is a (batch _ size, time _ steps, hidden _ size) shaped tensor, wherein the batch _ size is a batch size, the time _ steps is a sequence length, and the hidden _ size is a hidden layer unit size of the bottom layer encoder;
the output of the bottom layer encoder is converted into a (batch _ size, time _ steps, num _ classes) shaped tensor through the Dropout layer and the mapping layer, wherein num _ classes is the number of the target classes;
the conditional random field layer is used to derive a label for each element in the entire sequence based on the tensor of the (batch _ size, time _ steps, num _ classes) shape.
6. An electronic device, comprising:
a memory and a processor, the memory and the processor being communicatively connected to each other, the memory being configured to store a computer program, which when executed by the processor, implements the method of fine tuning of the power domain model of any of claims 1 to 4.
7. A computer-readable storage medium for storing a computer program which, when executed by a processor, implements the method of fine tuning of a power domain model of any of claims 1 to 4.
CN202211060951.9A 2022-09-01 2022-09-01 Electric power field model pre-training method, fine tuning method, device and equipment Active CN115129826B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202211060951.9A CN115129826B (en) 2022-09-01 2022-09-01 Electric power field model pre-training method, fine tuning method, device and equipment
PCT/CN2023/115522 WO2024046316A1 (en) 2022-09-01 2023-08-29 Power domain model pre-training method and apparatus, and fine-tuning method and apparatus, device, storage medium and computer program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211060951.9A CN115129826B (en) 2022-09-01 2022-09-01 Electric power field model pre-training method, fine tuning method, device and equipment

Publications (2)

Publication Number Publication Date
CN115129826A CN115129826A (en) 2022-09-30
CN115129826B true CN115129826B (en) 2022-11-22

Family

ID=83387399

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211060951.9A Active CN115129826B (en) 2022-09-01 2022-09-01 Electric power field model pre-training method, fine tuning method, device and equipment

Country Status (2)

Country Link
CN (1) CN115129826B (en)
WO (1) WO2024046316A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115129826B (en) * 2022-09-01 2022-11-22 国网智能电网研究院有限公司 Electric power field model pre-training method, fine tuning method, device and equipment

Family Cites Families (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11526679B2 (en) * 2020-04-24 2022-12-13 Microsoft Technology Licensing, Llc Efficient transformer language models with disentangled attention and multi-step decoding
CN112487145B (en) * 2020-12-01 2022-07-29 重庆邮电大学 O2O shop food safety monitoring method
CN112632252B (en) * 2020-12-25 2021-09-17 中电金信软件有限公司 Dialogue response method, dialogue response device, computer equipment and storage medium
CN112632972B (en) * 2020-12-25 2024-03-15 浙江国际海运职业技术学院 Method for rapidly extracting fault information in power grid equipment fault report
CN112612881B (en) * 2020-12-28 2022-03-25 电子科技大学 Chinese intelligent dialogue method based on Transformer
JP2024503518A (en) * 2021-01-20 2024-01-25 オラクル・インターナショナル・コーポレイション Context tag integration using named entity recognition model
CN113239700A (en) * 2021-04-27 2021-08-10 哈尔滨理工大学 Text semantic matching device, system, method and storage medium for improving BERT
CN113642330B (en) * 2021-07-19 2024-04-30 西安理工大学 Rail transit standard entity identification method based on catalogue theme classification
CN114386410B (en) * 2022-01-11 2023-07-11 腾讯科技(深圳)有限公司 Training method of pre-training model and text processing method
CN114579695A (en) * 2022-01-20 2022-06-03 杭州量知数据科技有限公司 Event extraction method, device, equipment and storage medium
CN114647715A (en) * 2022-04-07 2022-06-21 杭州电子科技大学 Entity recognition method based on pre-training language model
CN114722208B (en) * 2022-06-08 2022-11-01 成都健康医联信息产业有限公司 Automatic classification and safety level grading method for health medical texts
CN115129826B (en) * 2022-09-01 2022-11-22 国网智能电网研究院有限公司 Electric power field model pre-training method, fine tuning method, device and equipment

Also Published As

Publication number Publication date
CN115129826A (en) 2022-09-30
WO2024046316A1 (en) 2024-03-07

Similar Documents

Publication Publication Date Title
CN112528672B (en) Aspect-level emotion analysis method and device based on graph convolution neural network
CN106502985B (en) neural network modeling method and device for generating titles
WO2022057776A1 (en) Model compression method and apparatus
CN113642330A (en) Rail transit standard entity identification method based on catalog topic classification
CN109214006B (en) Natural language reasoning method for image enhanced hierarchical semantic representation
Pramanik et al. Text normalization using memory augmented neural networks
CN110688854A (en) Named entity recognition method, device and computer readable storage medium
CN112446211A (en) Text processing device, method, apparatus, and computer-readable storage medium
WO2023137911A1 (en) Intention classification method and apparatus based on small-sample corpus, and computer device
CN112364639B (en) Context-sensitive paraphrasing generation method and system based on pre-training language model
CN115129826B (en) Electric power field model pre-training method, fine tuning method, device and equipment
CN115759254A (en) Question-answering method, system and medium based on knowledge-enhanced generative language model
CN114281982B (en) Book propaganda abstract generation method and system adopting multi-mode fusion technology
CN113254586B (en) Unsupervised text retrieval method based on deep learning
CN114492661A (en) Text data classification method and device, computer equipment and storage medium
CN116680575B (en) Model processing method, device, equipment and storage medium
CN109117471B (en) Word relevancy calculation method and terminal
CN117875395A (en) Training method, device and storage medium of multi-mode pre-training model
CN117668157A (en) Retrieval enhancement method, device, equipment and medium based on knowledge graph
CN117056494A (en) Open domain question and answer method, device, electronic equipment and computer storage medium
CN116484851A (en) Pre-training model training method and device based on variant character detection
CN113704466B (en) Text multi-label classification method and device based on iterative network and electronic equipment
CN113449517B (en) Entity relationship extraction method based on BERT gated multi-window attention network model
Sileo et al. Composition of sentence embeddings: Lessons from statistical relational learning
CN114519353A (en) Model training method, emotion message generation device, emotion message generation equipment and emotion message generation medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20230412

Address after: 102209 18 Riverside Avenue, Changping District science and Technology City, Beijing

Patentee after: State Grid Smart Grid Research Institute Co.,Ltd.

Patentee after: STATE GRID CORPORATION OF CHINA

Address before: 102209 18 Riverside Avenue, Changping District science and Technology City, Beijing

Patentee before: State Grid Smart Grid Research Institute Co.,Ltd.