CN117009516A - Converter station fault strategy model training method, pushing method and device - Google Patents
Converter station fault strategy model training method, pushing method and device Download PDFInfo
- Publication number
- CN117009516A CN117009516A CN202310811621.7A CN202310811621A CN117009516A CN 117009516 A CN117009516 A CN 117009516A CN 202310811621 A CN202310811621 A CN 202310811621A CN 117009516 A CN117009516 A CN 117009516A
- Authority
- CN
- China
- Prior art keywords
- fault
- vector
- converter station
- data
- label
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 100
- 238000000034 method Methods 0.000 title claims abstract description 74
- 239000013598 vector Substances 0.000 claims abstract description 187
- 238000012423 maintenance Methods 0.000 claims abstract description 95
- 238000013145 classification model Methods 0.000 claims abstract description 40
- 238000002372 labelling Methods 0.000 claims abstract description 39
- 238000004590 computer program Methods 0.000 claims description 25
- 238000011282 treatment Methods 0.000 claims description 25
- 238000007689 inspection Methods 0.000 claims description 14
- 238000011156 evaluation Methods 0.000 claims description 13
- 238000012015 optical character recognition Methods 0.000 claims description 11
- 238000003860 storage Methods 0.000 claims description 11
- 238000005516 engineering process Methods 0.000 claims description 6
- 238000004364 calculation method Methods 0.000 claims description 5
- 238000012795 verification Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 238000011176 pooling Methods 0.000 description 5
- 238000000605 extraction Methods 0.000 description 4
- 230000000873 masking effect Effects 0.000 description 4
- 230000007246 mechanism Effects 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 239000003550 marker Substances 0.000 description 3
- 238000010606 normalization Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000011218 segmentation Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000013500 data storage Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 206010020649 Hyperkeratosis Diseases 0.000 description 1
- 208000001126 Keratosis Diseases 0.000 description 1
- 208000025174 PANDAS Diseases 0.000 description 1
- 208000021155 Paediatric autoimmune neuropsychiatric disorders associated with streptococcal infection Diseases 0.000 description 1
- 240000000220 Panda oleosa Species 0.000 description 1
- 235000016496 Panda oleosa Nutrition 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 230000008033 biological extinction Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 229910021389 graphene Inorganic materials 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000011112 process operation Methods 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000011269 treatment regimen Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2455—Query execution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computational Linguistics (AREA)
- Evolutionary Biology (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Databases & Information Systems (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The application relates to a converter station fault strategy model training method, a pushing method and a device. The training method comprises the following steps: acquiring a fault data set of a converter station, wherein the fault data set comprises fault cases and an operation and maintenance corpus of the converter station; obtaining a plurality of groups of keywords based on the fault cases and the operation corpus; determining a plurality of fault data vectors corresponding to the converter station based on the plurality of groups of keywords; labeling each fault data vector to obtain a label type sequence corresponding to each fault data vector; and taking the fault data vector and the mark type sequence as training data, training a text classification model based on the training data, and taking the trained text classification model as a fault strategy model. The method can improve the utilization efficiency of the existing fault case when handling the sudden problem.
Description
Technical Field
The present application relates to the field of artificial intelligence technology, and in particular, to a converter station fault policy model training method, a pushing method, an apparatus, a computer device, a storage medium, and a computer program product.
Background
Along with the development of the electric power market and the interconnection and interworking of power grids, the construction scale and the number of converter stations are also continuously increased. The types of faults and fault handling cases are also diverse, since the equipment in the converter station is numerous and operational for a long period of time. For the converter station equipment, operation and maintenance personnel can diagnose and check the running state or the fault reason of the equipment according to the existing fault cases, and the existing fault cases can provide experience and examples for the operation and maintenance personnel to deal with emergency.
However, most of the existing fault cases at present need to be searched by a large number of documents, and the sudden problem cannot be solved in time, so that the fault handling cases and the emergency plan text cannot be well utilized to cope with the sudden fault.
Therefore, how to improve the utilization efficiency of the existing fault case when handling the sudden problem is a technical problem to be solved.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a converter station fault policy model training method, a pushing method, an apparatus, a computer device, a computer readable storage medium, and a computer program product that can improve the efficiency of utilization of existing fault cases when handling sudden problems.
In a first aspect, the application provides a converter station fault strategy model training method. The method comprises the following steps:
acquiring a fault data set of a converter station, wherein the fault data set comprises fault cases and an operation and maintenance corpus of the converter station; the operation and maintenance corpus comprises fault operation and maintenance information of the converter station;
obtaining a plurality of groups of keywords based on the fault cases and the operation and maintenance corpus, wherein one group of keywords is used for representing one fault; determining a plurality of fault data vectors corresponding to the converter station based on the plurality of groups of keywords;
labeling each fault data vector to obtain a label type sequence corresponding to each fault data vector; wherein the tag type sequence includes at least two of a fault type tag, a fault feature tag, an inspection information tag, and a disposition measure tag; the disposal measure label is a binary label and comprises operation and maintenance role information and disposal information aiming at faults;
taking the fault data vector and the mark type sequence as training data, training a text classification model based on the training data, and taking the trained text classification model as a fault strategy model; the fault policy model is used for identifying structured fault data information from unstructured fault information, wherein the structured fault data information corresponds to information in the mark type sequence.
In one embodiment, the marking each fault data vector to obtain a marking type sequence corresponding to each fault data vector includes:
obtaining a mark type sequence template; the label type sequence templates consist of a plurality of label templates, and each label template corresponds to at least one preset label;
if the sub-vector of the fault data vector is matched with the target preset label of the label template, labeling the target preset label on the sub-vector of the fault data vector;
and after all sub-vectors of the fault data vector complete labeling corresponding to a plurality of label templates, obtaining a labeling type sequence corresponding to the fault data vector which completes labeling.
In one embodiment, if the sub-vector of the fault data vector matches the target preset label of the label template, labeling the target preset label for the sub-vector of the fault data vector includes:
calculating the space distance between the sub-vector of the fault data vector and the label feature vector corresponding to the target preset label;
if the space distance is smaller than a preset distance value, the label feature vector is matched with the sub-vector of the fault data vector, and the sub-vector of the fault data vector is marked with the mark type corresponding to the label feature vector.
In one embodiment, the training the text classification model based on the training data, taking the trained text classification model as the fault policy model, includes:
inputting the training data into the text classification model, and back-propagating and updating model parameters according to preset training times;
after the text classification model finishes updating the training times, determining whether an evaluation index of the trained text classification model accords with a preset convergence condition according to a preset verification set;
if the evaluation index does not accord with the preset convergence condition, returning to execute the step of updating the model parameters by back propagation according to the preset training times; or,
and if the evaluation index meets the preset convergence condition, taking the trained text classification model as the fault strategy model.
In one embodiment, before obtaining the plurality of groups of keywords based on the fault case and the operation corpus, the method further includes:
and acquiring text data corresponding to the image data in the fault data set according to an OCR optical character recognition technology, and replacing the image data with the text data corresponding to the image data to optimize the fault data set.
In a second aspect, the application provides a converter station fault strategy pushing method. The method comprises the following steps:
after receiving the current fault type sent by the user terminal, determining the similarity between the current fault type and the fault type of each fault case in the standard case library to obtain a plurality of similarities;
pushing converter station fault handling measures corresponding to the current fault type to the user terminal according to the similarities;
the standard case library is obtained by carrying out structural identification on a plurality of fault cases of the converter station through a fault strategy model; the fault policy model is trained by the method as described in the first aspect.
In one embodiment, the similarity between the current fault type and the fault type of each fault case in the standard case library is determined, so as to obtain a plurality of similarities; pushing, to the user terminal, a converter station fault handling measure corresponding to the current fault type according to the plurality of similarities, including:
obtaining a current fault vector corresponding to the current fault type and a standard fault vector corresponding to the fault type of each fault case in a standard case library;
Calculating cosine similarity scores of the current fault type vector and each standard fault vector to obtain a plurality of cosine similarity scores;
and obtaining a disposal measure corresponding to the fault type of which the cosine similarity score reaches a preset condition from the standard case library, and taking the disposal measure as a converter station fault disposal measure pushed to the user terminal.
In one embodiment, each standard case in the standard case library correspondingly stores a disposal measure and operation and maintenance role information corresponding to the disposal measure; after the treatment measures corresponding to the fault types, for which the cosine similarity scores reach the preset conditions, are obtained from the standard case library, the treatment measures further comprise:
and determining a converter station fault handling measure matched with the operation and maintenance role information from the handling measures according to the operation and maintenance role information corresponding to the user terminal, wherein the operation and maintenance role information is used as the converter station fault handling measure pushed to the user terminal.
In a third aspect, the application further provides a converter station fault strategy model training device. The device comprises:
the fault data acquisition module is used for acquiring a fault data set of the converter station, wherein the fault data set comprises fault cases and an operation and maintenance corpus of the converter station; the operation and maintenance corpus comprises fault operation and maintenance information of the converter station;
The fault vector determining module is used for obtaining a plurality of groups of keywords based on the fault cases and the operation and maintenance corpus, and one group of keywords is used for representing one fault; determining a plurality of fault data vectors corresponding to the converter station based on the plurality of groups of keywords;
the fault vector labeling module is used for labeling each fault data vector to obtain a label type sequence corresponding to each fault data vector; wherein the tag type sequence includes at least two of a fault type tag, a fault feature tag, an inspection information tag, and a disposition measure tag; the disposal measure label is a binary label and comprises operation and maintenance role information and disposal information aiming at faults;
the model training module is used for taking the fault data vector and the mark type sequence as training data, training a text classification model based on the training data, and taking the trained text classification model as a fault strategy model; the fault policy model is used for identifying structured fault data information from unstructured fault information, wherein the structured fault data information corresponds to information in the mark type sequence.
In a fourth aspect, the application further provides a converter station fault strategy pushing device. The device comprises:
The similarity calculation module is used for determining the similarity between the current fault type and the fault type of each fault case in the standard case library after receiving the current fault type sent by the user terminal, so as to obtain a plurality of similarities;
the disposal measure pushing module is used for pushing the converter station fault disposal measure corresponding to the current fault type to the user terminal according to the plurality of similarities;
the standard case library is obtained by carrying out structural identification on a plurality of fault cases of the converter station through a fault strategy model; the fault policy model is trained by the method as described in the first aspect.
In a fifth aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor implementing the steps of the method in the first or second aspect when the processor executes the computer program.
In a sixth aspect, the present application also provides a computer readable storage medium. The computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method described in the first or second aspect.
In a seventh aspect, the present application also provides a computer program product. The computer program product comprising a computer program which, when executed by a processor, implements the steps of the method described in the first or second aspect.
The converter station fault strategy model training method, the pushing method, the device, the computer equipment, the storage medium and the computer program product are used for acquiring a fault data set of the converter station, wherein the fault data set comprises fault cases and an operation and maintenance corpus of the converter station; the operation and maintenance corpus comprises fault operation and maintenance information of the converter station; obtaining a plurality of groups of keywords based on the fault cases and the operation and maintenance corpus, wherein one group of keywords is used for representing one fault; determining a plurality of fault data vectors corresponding to the converter station based on the plurality of groups of keywords; labeling each fault data vector to obtain a label type sequence corresponding to each fault data vector; wherein the tag type sequence includes at least two of a fault type tag, a fault feature tag, an inspection information tag, and a disposition measure tag; the disposal measure label is a binary label and comprises operation and maintenance role information and disposal information aiming at faults; taking the fault data vector and the mark type sequence as training data, training a text classification model based on the training data, and taking the trained text classification model as a fault strategy model; the fault strategy model is used for identifying structured fault data information from unstructured fault information, the structured fault data information corresponds to information in the marking type sequence, and it is known that text corresponding to fault cases and operation and maintenance corpuses of a converter station is vectorized and marked for corresponding vectors, so that a label is added for each fault data vector, training data with the fault cases and the operation and maintenance corpuses being corpuses is obtained, a training text classification model is used for obtaining a model for identifying structured fault data information from unstructured fault information, and the structured fault data information comprises a plurality of key description information corresponding to the fault; furthermore, the structured standard case library can be extracted from the fault cases through the model, and the fault strategy of the converter station can be obtained by pushing the base case library to maintenance personnel, so that the utilization efficiency of the maintenance personnel on the existing fault cases when the maintenance personnel deal with the sudden problems is improved.
Drawings
Fig. 1 is an application environment diagram of a converter station fault policy model training method or a converter station fault policy pushing method in one embodiment;
fig. 2 is a flow chart of a method of training a converter station fault strategy model in one embodiment;
fig. 3 is a flow chart of a method for pushing a converter station fault strategy in one embodiment;
fig. 4 is a flow chart of a converter station fault policy model training method and a converter station fault policy pushing method in an embodiment;
fig. 5 is a block diagram of a converter station fault policy model training device in one embodiment;
fig. 6 is a block diagram of a converter station fault policy pushing device in one embodiment;
fig. 7 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
The converter station fault strategy model training method or the converter station fault strategy pushing method provided by the embodiment of the application can be applied to an application environment shown in fig. 1. The terminal 102 corresponding to the converter station communicates with the server 104 through a network, and the server 104 receives a fault data set, a standard case library and a converter station disposal measure of the converter station sent by the terminal 102. The data storage system may store data that the server 104 needs to process. The data storage system may be integrated on the server 104 or may be located on a cloud or other network server. The terminal 102 may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things devices, and portable wearable devices, where the internet of things devices may be smart speakers, smart televisions, smart air conditioners, smart vehicle devices, and the like. The portable wearable device may be a smart watch, smart bracelet, headset, or the like. The server 104 may be implemented as a stand-alone server or as a server cluster of multiple servers.
In one embodiment, as shown in fig. 2, a method for training a converter station fault policy model is provided, and the method is applied to the server 104 in fig. 1 for illustration, and includes the following steps:
s202, acquiring a fault data set of a converter station, wherein the fault data set comprises fault cases and an operation and maintenance corpus of the converter station; the operation and maintenance corpus contains fault operation and maintenance information of the converter station.
The server collects a dataset containing fault cases and an operation and maintenance corpus of the converter stations. The fault cases are fault case records recorded after operation and maintenance personnel process operation and maintenance faults of the converter station, fault operation and maintenance information which can be used for the converter station is contained in the operation and maintenance corpus, for example, the fault operation and maintenance information contains a fault maintenance knowledge base of general equipment, fault types of the general equipment, common fault phenomena, emergency treatment tasks and the like.
The data set should include information such as fault case description, fault type, fault phenomenon, where it should be checked, emergency that the different roles of operation and maintenance personnel should do. The data set may be obtained from a log of the converter station, from an existing fault report, from an operation and maintenance document or from expert knowledge.
It will be appreciated that for text content of the extracted dataset, format unification is required. Text processing can be performed through preprocessing steps such as text cleaning, word segmentation, stop word removal and the like, so that consistency and processibility of text data are ensured.
Optionally, if the pictures and tables contained in the dataset are associated with text data of the fault case, the location information in the file is used to associate the content of the pictures and tables with the corresponding text, which helps to better understand the content of the fault case in subsequent task processing.
S204, obtaining a plurality of groups of keywords based on the fault cases and the operation and maintenance corpus, wherein one group of keywords is used for representing one fault; and determining a plurality of fault data vectors corresponding to the converter station based on the plurality of groups of keywords.
The text data can be segmented through the segmentation tool, so that multiple groups of keywords are obtained, and the required multiple groups of keywords can be obtained through manual segmentation. Multiple keywords in the same fault case are used to characterize the fault, and therefore, multiple keywords in the fault case are used as a set of keywords. Similarly, multiple sets of keywords correspond to multiple faults.
It should be appreciated that the keyword or text data may be vectorized in a variety of ways, for example, word embedding, text vectorization, one-hot encoding, word bag model, TF-IDF word frequency inverse text frequency index (term frequency-inverse document frequency), and the like may be used to obtain the fault data vector corresponding to the plurality of sets of keywords.
Optionally, the pre-processed text data is marked with a Tokenizer and truncated or padded to maintain a fixed length. The tagged text is then converted into word embedding vectors, which can be obtained using a pre-trained RoBERTa model.
S206, marking each fault data vector to obtain a marking type sequence corresponding to each fault data vector; wherein the tag type sequence includes at least two of a fault type tag, a fault feature tag, an inspection information tag, and a disposition measure tag; the disposal measure label is a binary label and comprises operation and maintenance role information and disposal information aiming at faults.
It should be understood that the training model needs to obtain the labeled training samples, so that labels corresponding to each fault data vector need to be obtained, and the label type sequence is considered to be used as the labeling content of the fault data vector. As the labels constituting the label type sequence, a failure type label, a failure feature label, an inspection information label, and a disposal measure label may be used, thereby performing multi-label labeling for the failure data vector. The handling measure label is a binary label, so that the handling measure label further comprises two sub-labels which are respectively used for representing operation and maintenance role information and handling information aiming at faults, different fault handling measures provided by different operation and maintenance roles can be represented in the handling measure label, and the effect that different fault handling measures can be determined for different operation and maintenance roles through the handling measure label is finally achieved.
Alternatively, for labeling of fault type tags, fault feature tags, inspection information tags, and disposition measure tags, they may be encoded as sequence tags using the manner of sequence labeling tasks. The RoBERTa model may be used to classify each marker in the sequence, determining whether the fault data vector to be annotated belongs to a fault type, a fault feature, inspection information or a treatment measure. And labeling the treatment measures of the operation and maintenance personnel with different roles, and encoding the treatment measure of each role into a binary label by adopting a multi-label classification mode. In this case, the RoBERTa model may classify the treatment measures corresponding to each role. In this way, tag codes for different tasks can be integrated into the same RoBERTa model, with which faulty data vectors can be identified and marked with a sequence of marker types.
S208, taking the fault data vector and the mark type sequence as training data, training a text classification model based on the training data, and taking the trained text classification model as a fault strategy model; the fault policy model is used for identifying structured fault data information from unstructured fault information, wherein the structured fault data information corresponds to information in the mark type sequence.
It should be understood that the fault data vector and the mark type sequence have a corresponding relationship, so that the fault data vector and the mark type sequence can be used as training data for supervision training, and the training data can train the fault strategy model to have the following effects through training: structured fault data information is obtained from unstructured fault case data, and includes corresponding types in a sequence of marker types.
For example, input of a fault case data from unstructured, through the identification of a fault policy model, structured data as shown in the following table may be obtained:
at this time, the unstructured fault case text can be converted into fault text classifications corresponding to fault types, fault characteristics, inspection information and treatment measures in the table through the fault policy model, so that the structured information corresponding to the fault case can be directly obtained, and the operation and maintenance roles in the fault case can also be obtained, which treatment measure should be executed.
Alternatively, the fault policy model may be a RoBERTa model, which may learn the fault type, fault phenomenon, inspection information, and association and feature representation between different roles of treatment measures at the same time, so as to perform comprehensive text understanding and information extraction, and besides, the fault policy model may also be a model such as BERT (Bidirectional Encoder Representations from Transformers), convolutional neural network, recurrent neural network, and self-attention model.
Taking the RoBERTa model as an example, the following are the main layers of the RoBERTa model and their corresponding functions:
input embedding layer (Input Embedding Layer): the tag sequence of the input text is converted into an embedded vector representation.
This layer uses Token embedding (Token embedding), position embedding (Position Embeddings), and paragraph embedding (Segment Embeddings) to represent the input.
Transducer encoder layer (Transformer Encoder Layers): the Roberta model is formed by stacking a plurality of transducer encoder layers of the same structure.
Each encoder layer contains a Self-attention mechanism (Self-Attention Mechanism) and a Feed-forward neural network (Feed-forward Neural Network).
These encoder layers help to capture semantic relationships and contextual information of the input text. Residual connection layer (Residual Connections): inside the layer of the Transformer encoder, the residual connection is used to add the input to the output of the layer, thereby helping information flow between layers and avoiding the problem of gradient extinction.
Normalization layer (Layer Normalization): after each transducer encoder layer, layer normalization is applied to normalize the input features, improving the robustness and training speed of the model.
Masking (Masking) layer: the self-attention mechanism in the Roberta model employs a masking mechanism that prevents information leakage by masking (Mask) the input to ensure that each mark sees only the previous mark.
Pooling Layer (Pooling Layer): the output of the RoBERTa model may be summarized by a pooling layer, such as average pooling or maximum pooling, to obtain a global representation or a fixed length text representation.
And on the basis of the Roberta model, an additional full-connection layer is added to perform characteristic extraction and classification tasks. The model output is converted to a class probability distribution using a softmax activation function, using a Conditional Random Field (CRF) loss function and an SGD optimizer.
According to the training method of the fault strategy model of the converter station, the fault cases of the converter station and the texts corresponding to the operation and maintenance corpus are vectorized and the corresponding vectors are marked, so that labels are added to each fault data vector, training data with the fault cases and the operation and maintenance corpus being corpuses is obtained, a training data training text classification model is used for obtaining a model for identifying structured fault data information from unstructured fault information, the model can be used for extracting a structured standard case library from the fault cases, and the fault strategy of the converter station is obtained by pushing the standard case library to maintenance personnel according to the case library, so that the utilization efficiency of the maintenance personnel on the existing fault cases when the maintenance personnel deal with sudden problems is improved.
In one embodiment, the marking each fault data vector to obtain a marking type sequence corresponding to each fault data vector includes:
obtaining a mark type sequence template; the label type sequence templates consist of a plurality of label templates, and each label template corresponds to at least one preset label;
if the sub-vector of the fault data vector is matched with the target preset label of the label template, labeling the target preset label on the sub-vector of the fault data vector;
and after all sub-vectors of the fault data vector complete labeling corresponding to a plurality of label templates, obtaining a labeling type sequence corresponding to the fault data vector which completes labeling.
It will be appreciated that it is necessary to determine what fault data vector needs to be annotated when the annotation is made and what type of annotation sequence the fault data vector needs to be annotated is, and therefore, a type of annotation sequence template may be used and consists of a plurality of tag templates. For example, the label template is a template of a fault type label and a template of a disposal measure label, then the subvector of the fault data vector is traversed at this time to judge whether the subvector is matched with the template corresponding to the fault type label, if not, labeling is not performed, if so, the subvector is labeled as the fault type label, and similarly, the disposal measure label can be obtained.
Typically, each label template corresponds to a plurality of preset labels, for example, the fault types include fault types such as hardware types, software types, physical types, etc., if the template corresponding to the treatment measure label may have a plurality of preset sub-labels, for example, the sub-labels of the treatment measure label with operation and maintenance personnel information and the sub-labels of the specific treatment measure, and the preset sub-labels of the operation and maintenance personnel information may include a station leader, a station length, a value length, an operator and operation and maintenance personnel.
After all the sub-vectors pass through the traversal and labeling of the various label templates, the fault data vector is labeled correspondingly.
In this embodiment, through the above steps, the fault data vector can be labeled, and the labeled label type sequence corresponding to the fault data vector can be obtained.
In one embodiment, if the sub-vector of the fault data vector matches the target preset label of the label template, labeling the target preset label for the sub-vector of the fault data vector includes:
calculating the space distance between the sub-vector of the fault data vector and the label feature vector corresponding to the target preset label;
if the space distance is smaller than a preset distance value, the label feature vector is matched with the sub-vector of the fault data vector, and the sub-vector of the fault data vector is marked with the mark type corresponding to the label feature vector.
It should be appreciated that whether the sub-vector matches the preset label may be determined by an association relationship between the label feature vector corresponding to the label template and the sub-vector of the fault data vector. The spatial distance between the vectors can be used to determine the degree of matching of the two. At this time, a preset distance value can be set as a threshold value, and when the preset distance between the sub-vector and the tag feature vector is smaller than the threshold value, the distance between the sub-vector and the tag feature vector is close to each other, and the similarity is high, so that the sub-vector and the tag feature vector are matched, and the sub-vector is marked.
In this embodiment, through the above steps, whether the two are similar can be determined based on the spatial distance, so as to obtain the matching degree between the sub-vector and the preset label.
In one embodiment, the training the text classification model based on the training data, taking the trained text classification model as the fault policy model, includes:
inputting the training data into the text classification model, and back-propagating and updating model parameters according to preset training times;
after the text classification model finishes updating the training times, determining whether an evaluation index of the trained text classification model accords with a preset convergence condition according to a preset verification set;
If the evaluation index does not accord with the preset convergence condition, returning to execute the step of updating the model parameters by back propagation according to the preset training times; or,
and if the evaluation index meets the preset convergence condition, taking the trained text classification model as the fault strategy model.
It should be understood that the training data is input into the model, and the model is trained by the preset training times, and at this time, parameters inside the model are continuously changed along with updating, so as to finally achieve convergence.
Optionally, the model is trained using the prepared training data. The vectorized fault case description is taken as input, the corresponding label is taken as output, and the model parameters are updated through back propagation. The trained model is evaluated using a reserved validation set or cross validation method. And calculating indexes such as accuracy, recall rate, F1 score and the like to measure the performance of the model. And adjusting and improving the model according to the evaluation result.
In addition, knowledge about operation and maintenance is added to training of the model as a corpus to help improve accuracy and robustness of the model in extracting places to be checked from the fault case text. The operation and maintenance related knowledge may include various fault types, common fault phenomena, emergency handling tasks, and the like. The knowledge is taken as additional training data, so that the model can better understand the specific terms and the context of the operation and maintenance field when learning the fault case text. The addition of the operation and maintenance related knowledge can provide richer background information for the model, and help the model to better understand the fault case text, so that the extraction accuracy of the corresponding checked place is improved.
The operation and maintenance related knowledge is consolidated into a text form and combined with the fault case data set. In the training process, the model simultaneously learns semantic representation of fault case text and related information of operation and maintenance knowledge, so that the recognition capability of the model at a place to be checked is improved.
In this embodiment, through the above steps, a structured text data extraction model for a fault case of the converter can be trained, so that key information in an unstructured fault case can be rapidly obtained.
In one embodiment, before obtaining the plurality of groups of keywords based on the fault cases and the operation and maintenance corpus, the method further includes:
and acquiring text data corresponding to the image data in the fault data set according to an OCR optical character recognition technology, and replacing the image data with the text data corresponding to the image data to optimize the fault data set.
It should be appreciated that since the fault cases are mostly word and pdf files, text extraction is first required. The text in the file can be extracted using pdfminer, textract, python-docx, etc. For documents containing pictures, OCR (Optical Character Recognition ) technology may be used to extract text from the pictures. The OCR tool may convert text in the picture into editable text form. For files containing tables, the table data may be extracted and converted to a structured form using tools such as tabula-py, python-docx, pandas, etc. This allows for more convenient subsequent data processing and analysis.
In one embodiment, as shown in fig. 3, a method for pushing a fault policy of a converter station is provided, and the method is applied to the server 104 in fig. 1 for illustration, and includes the following steps:
s302, after receiving the current fault type sent by the user terminal, determining the similarity between the current fault type and the fault type of each fault case in the standard case library, and obtaining a plurality of similarities.
It should be appreciated that the scenario of converter station fault policy pushing is as follows: when facing the fault of the converter station, the user can input the current fault type and send the current fault type to the server 104 through the terminal, so as to obtain a specific fault strategy returned by the server 104.
The standard case library is obtained by carrying out structural identification on a plurality of fault cases of the converter station according to the trained fault strategy model, so that each fault case is structured data in the standard case library and comprises the fault type and the treatment measure of each fault case.
Since the current fault type transmitted by the user terminal is text data and the fault type of each fault case in the standard case library is also text data, the text similarity can be calculated.
And S303, pushing the converter station fault handling measures corresponding to the current fault type to the user terminal according to the similarities.
The standard case library is obtained by carrying out structural identification on a plurality of fault cases of the converter station through a fault strategy model; the fault strategy model is obtained through training the method in the converter station fault strategy model training method.
It should be understood that if the similarity is higher, the current fault type is more matched with the fault type in the standard case library, so that the fault strategies of the current fault type and the fault strategies of the current fault type are more similar, and based on the theory, corresponding treatment measures in the standard case library are acquired and sent to the user terminal.
Optionally, each time a new fault case is input, the fault type, fault characteristics, inspection points and disposal measures in the fault case text are automatically extracted through model identification. Taking the identification result of a fault case of the action of a certain converter transformer pressure relief valve as an example:
optionally, as an embodiment, the fault detection system is used together with a fault recognition system in the current converter station, and when a fault occurs, a plurality of similar fault cases can be automatically selected according to the fault type; and an emergency treatment strategy corresponding to the role of the operation and maintenance personnel is provided, so that the operation and maintenance efficiency is improved.
Optionally, as an embodiment, each new fault case is added into the database as a new fault type, so as to improve the coverage rate of the fault case, and the new fault case is added into the data set as a new training sample, so as to improve the accuracy rate of named entity identification.
If the current fault case coincides with the fault case in the database, the fault case after the current fault treatment is treated is used as the updating optimization of the similar fault case. If the current fault type is not coincident with the fault case in the database, the current identified fault case is used as a new case to be supplemented.
In one embodiment, the similarity between the current fault type and the fault type of each fault case in the standard case library is determined, so as to obtain a plurality of similarities; pushing, to the user terminal, a converter station fault handling measure corresponding to the current fault type according to the plurality of similarities, including:
obtaining a current fault vector corresponding to the current fault type and a standard fault vector corresponding to the fault type of each fault case in a standard case library;
calculating cosine similarity scores of the current fault type vector and each standard fault vector to obtain a plurality of cosine similarity scores;
and obtaining a disposal measure corresponding to the fault type of which the cosine similarity score reaches a preset condition from the standard case library, and taking the disposal measure as a converter station fault disposal measure pushed to the user terminal.
Alternatively, cosine similarity is used to measure the angle between the two vectors, reflecting their degree of similarity. For the output vectors of the two texts, cosine similarity between them can be calculated to evaluate their similarity. The method comprises the following specific steps:
(1) Vectorizing the input fault type;
(2) Inputting the vectorized text into a database, and obtaining an output vector representation of the existing fault type in the database;
(3) Calculating cosine similarity scores between vectors by using a cosine similarity formula;
(4)
wherein vector A and vector B represent the output vectors of the two texts, the dot product representing the vector is represented, the |vector || represents a norm (length) of the vector;
and judging the similarity of the text according to the cosine similarity score. A score of near 1 indicates that the two texts are very similar, and a score of near-1 indicates that the two texts are very dissimilar.
In this embodiment, the cosine similarity between the current fault vector and the plurality of standard fault vectors can be rapidly calculated through the cosine similarity and the text vector.
In one embodiment, each standard case in the standard case library correspondingly stores a treatment measure and operation and maintenance role information corresponding to the treatment measure; after the treatment measures corresponding to the fault types, for which the cosine similarity scores reach the preset conditions, are obtained from the standard case library, the treatment measures further comprise:
And determining a converter station fault handling measure matched with the operation and maintenance role information from the handling measures according to the operation and maintenance role information corresponding to the user terminal, wherein the operation and maintenance role information is used as the converter station fault handling measure pushed to the user terminal.
It should be understood that, when the user terminal sends the current fault type, the operation and maintenance role information corresponding to the user terminal is also sent together, and at this time, it can be determined by the operation and maintenance role information of the terminal, which is to solve the problem of the current fault type, so, in order to push the fault handling measures more accurately, further, in the handling measures corresponding to the matched standard case library, the converter station fault handling measures corresponding to the operation and maintenance role information of the terminal are found out.
In this embodiment, through the operation and maintenance role information of the terminal, the fault handling measures of the converter station corresponding to the operation and maintenance role can be matched, so as to achieve the effect of pushing the corresponding fault handling measures for the operation and maintenance role.
Alternatively, as an embodiment, as shown in fig. 4, it includes:
s410, collecting fault cases of converter station equipment, and preprocessing the fault cases to form a data set.
S420, improving the Roberta model, and training the model by using fault cases and operation and maintenance related knowledge texts.
And S430, extracting fault event characteristics (including fault types, fault phenomena, inspection information and treatment measures of the keratosis) in the fault case by using the trained model.
S440, establishing a standard structured fault case library of the converter station.
S450, inputting the type of the fault, and outputting the fault case with the highest similarity.
After S430, if a new fault case is generated, the process goes to S431 to input a new fault case, at this time, output a structured fault case text that can only be identified, and add the structured fault case text to the fault case standard structured database.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides a converter station fault strategy model training device for realizing the above-mentioned converter station fault strategy model training method. The implementation scheme of the solution provided by the device is similar to the implementation scheme recorded in the method, so the specific limitation in the embodiments of the device for training the fault policy model of the converter station provided below can be referred to the limitation of the training method of the fault policy model of the converter station in the above description, and the description is omitted here.
In one embodiment, as shown in fig. 5, there is provided a converter station fault policy model training apparatus 500, comprising: a fault data acquisition module 501, a fault vector determination module 502, a fault vector labeling module 503, and a model training module 504, wherein:
a fault data obtaining module 501, configured to obtain a fault data set of a converter station, where the fault data set includes a fault case and an operation and maintenance corpus of the converter station; the operation and maintenance corpus comprises fault operation and maintenance information of the converter station;
the fault vector determining module 502 is configured to obtain a plurality of groups of keywords based on the fault case and the operation and maintenance corpus, where one group of keywords is used to characterize a fault; determining a plurality of fault data vectors corresponding to the converter station based on the plurality of groups of keywords;
A fault vector labeling module 503, configured to label each fault data vector, so as to obtain a label type sequence corresponding to each fault data vector; wherein the tag type sequence includes at least two of a fault type tag, a fault feature tag, an inspection information tag, and a disposition measure tag; the disposal measure label is a binary label and comprises operation and maintenance role information and disposal information aiming at faults;
the model training module 504 is configured to use the fault data vector and the tag type sequence as training data, train a text classification model based on the training data, and use the trained text classification model as a fault policy model; the fault policy model is used for identifying structured fault data information from unstructured fault information, wherein the structured fault data information corresponds to information in the mark type sequence.
Further, in one embodiment, the fault vector labeling module 503 is further configured to:
obtaining a mark type sequence template; the label type sequence templates consist of a plurality of label templates, and each label template corresponds to at least one preset label;
If the sub-vector of the fault data vector is matched with the target preset label of the label template, labeling the target preset label on the sub-vector of the fault data vector;
and after all sub-vectors of the fault data vector complete labeling corresponding to a plurality of label templates, obtaining a labeling type sequence corresponding to the fault data vector which completes labeling.
Further, in one embodiment, the fault vector labeling module 503 is further configured to:
calculating the space distance between the sub-vector of the fault data vector and the label feature vector corresponding to the target preset label;
if the space distance is smaller than a preset distance value, the label feature vector is matched with the sub-vector of the fault data vector, and the sub-vector of the fault data vector is marked with the mark type corresponding to the label feature vector.
Further, in one embodiment, the model training module 504 is further configured to:
inputting the training data into the text classification model, and back-propagating and updating model parameters according to preset training times;
after the text classification model finishes updating the training times, determining whether an evaluation index of the trained text classification model accords with a preset convergence condition according to a preset verification set;
If the evaluation index does not accord with the preset convergence condition, returning to execute the step of updating the model parameters by back propagation according to the preset training times; or,
and if the evaluation index meets the preset convergence condition, taking the trained text classification model as the fault strategy model.
Further, in one embodiment, the converter station fault policy model training device 500 further provides a character recognition module, configured to obtain text data corresponding to image data in the fault dataset according to an OCR optical character recognition technology, and replace the text data corresponding to the image data with the image data to optimize the fault dataset.
The modules in the converter station fault policy model training device can be implemented in whole or in part by software, hardware and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
Based on the same inventive concept, the embodiment of the application also provides a converter station fault strategy pushing device for realizing the above related converter station fault strategy pushing method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the present application for one or more fault policy pushing devices for a converter station may be referred to the limitation of the fault policy pushing method for a converter station in the foregoing description, which is not repeated herein.
In one embodiment, as shown in fig. 6, there is provided a converter station fault policy pushing device 600, including: a similarity calculation module 601 and a disposition measure pushing module 602, wherein:
the similarity calculation module 601 is configured to determine, after receiving a current fault type sent by a user terminal, a similarity between the current fault type and a fault type of each fault case in a standard case library, so as to obtain a plurality of similarities;
a handling measure pushing module 602, configured to push, to the user terminal, a converter station fault handling measure corresponding to the current fault type according to the multiple similarities;
the standard case library is obtained by carrying out structural identification on a plurality of fault cases of the converter station through a fault strategy model; the fault strategy model is obtained through training by the converter station fault strategy model training method.
Further, in one embodiment, the similarity calculation module 601 is further configured to:
obtaining a current fault vector corresponding to the current fault type and a standard fault vector corresponding to the fault type of each fault case in a standard case library;
and calculating cosine similarity scores of the current fault type vector and each standard fault vector to obtain a plurality of cosine similarity scores.
The disposition measure pushing module 602 is further configured to:
and obtaining a disposal measure corresponding to the fault type of which the cosine similarity score reaches a preset condition from the standard case library, and taking the disposal measure as a converter station fault disposal measure pushed to the user terminal.
Further, in an embodiment, the disposition measure pushing module 602 is further configured to determine, from the disposition measures, a converter station fault disposition measure matching the operation and maintenance role information according to the operation and maintenance role information corresponding to the user terminal, as a converter station fault disposition measure pushed to the user terminal.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is for storing a fault dataset of the converter station. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by the processor, implements a converter station fault strategy model training method or a converter station fault strategy pushing method.
It will be appreciated by those skilled in the art that the structure shown in FIG. 7 is merely a block diagram of some of the structures associated with the present inventive arrangements and is not limiting of the computer device to which the present inventive arrangements may be applied, and that a particular computer device may include more or fewer components than shown, or may combine some of the components, or have a different arrangement of components.
In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.
Claims (13)
1. A method for training a converter station fault strategy model, the method comprising:
acquiring a fault data set of a converter station, wherein the fault data set comprises fault cases and an operation and maintenance corpus of the converter station; the operation and maintenance corpus comprises fault operation and maintenance information of the converter station;
obtaining a plurality of groups of keywords based on the fault cases and the operation and maintenance corpus, wherein one group of keywords is used for representing one fault; determining a plurality of fault data vectors corresponding to the converter station based on the plurality of groups of keywords;
Labeling each fault data vector to obtain a label type sequence corresponding to each fault data vector; wherein the tag type sequence includes at least two of a fault type tag, a fault feature tag, an inspection information tag, and a disposition measure tag; the disposal measure label is a binary label and comprises operation and maintenance role information and disposal information aiming at faults;
taking the fault data vector and the mark type sequence as training data, training a text classification model based on the training data, and taking the trained text classification model as a fault strategy model; the fault policy model is used for identifying structured fault data information from unstructured fault information, wherein the structured fault data information corresponds to information in the mark type sequence.
2. The method of claim 1, wherein labeling each fault data vector to obtain a label type sequence corresponding to each fault data vector comprises:
obtaining a mark type sequence template; the label type sequence templates consist of a plurality of label templates, and each label template corresponds to at least one preset label;
If the sub-vector of the fault data vector is matched with the target preset label of the label template, labeling the target preset label on the sub-vector of the fault data vector;
and after all sub-vectors of the fault data vector complete labeling corresponding to a plurality of label templates, obtaining a labeling type sequence corresponding to the fault data vector which completes labeling.
3. The method of claim 2, wherein labeling the sub-vector of the fault data vector with the target preset tag if the sub-vector of the fault data vector matches the target preset tag of the tag template, comprises:
calculating the space distance between the sub-vector of the fault data vector and the label feature vector corresponding to the target preset label;
if the space distance is smaller than a preset distance value, the label feature vector is matched with the sub-vector of the fault data vector, and the sub-vector of the fault data vector is marked with the mark type corresponding to the label feature vector.
4. The method of claim 1, wherein training a text classification model based on the training data, using the trained text classification model as a fault policy model, comprises:
Inputting the training data into the text classification model, and back-propagating and updating model parameters according to preset training times;
after the text classification model finishes updating the training times, determining whether an evaluation index of the trained text classification model accords with a preset convergence condition according to a preset verification set;
if the evaluation index does not accord with the preset convergence condition, returning to execute the step of updating the model parameters by back propagation according to the preset training times; or,
and if the evaluation index meets the preset convergence condition, taking the trained text classification model as the fault strategy model.
5. The method of claim 1, wherein before obtaining the plurality of sets of keywords based on the fault cases and the operation corpus, further comprises:
and acquiring text data corresponding to the image data in the fault data set according to an OCR optical character recognition technology, and replacing the image data with the text data corresponding to the image data to optimize the fault data set.
6. A converter station fault policy pushing method, characterized in that the method comprises:
after receiving the current fault type sent by the user terminal, determining the similarity between the current fault type and the fault type of each fault case in the standard case library to obtain a plurality of similarities;
Pushing converter station fault handling measures corresponding to the current fault type to the user terminal according to the similarities;
the standard case library is obtained by carrying out structural identification on a plurality of fault cases of the converter station through a fault strategy model; the fault policy model is trained by the method of any one of claims 1 to 5.
7. The method of claim 6, wherein the determining the similarity of the current fault type to the fault type of each fault case in the standard case library obtains a plurality of similarities; pushing, to the user terminal, a converter station fault handling measure corresponding to the current fault type according to the plurality of similarities, including:
obtaining a current fault vector corresponding to the current fault type and a standard fault vector corresponding to the fault type of each fault case in a standard case library;
calculating cosine similarity scores of the current fault type vector and each standard fault vector to obtain a plurality of cosine similarity scores;
and obtaining a disposal measure corresponding to the fault type of which the cosine similarity score reaches a preset condition from the standard case library, and taking the disposal measure as a converter station fault disposal measure pushed to the user terminal.
8. The method of claim 7, wherein each standard case in the standard case library correspondingly stores a treatment measure and operation and maintenance role information corresponding to the treatment measure; after the treatment measures corresponding to the fault types, for which the cosine similarity scores reach the preset conditions, are obtained from the standard case library, the treatment measures further comprise:
and determining a converter station fault handling measure matched with the operation and maintenance role information from the handling measures according to the operation and maintenance role information corresponding to the user terminal, wherein the operation and maintenance role information is used as the converter station fault handling measure pushed to the user terminal.
9. A converter station fault policy model training device, the device comprising:
the fault data acquisition module is used for acquiring a fault data set of the converter station, wherein the fault data set comprises fault cases and an operation and maintenance corpus of the converter station; the operation and maintenance corpus comprises fault operation and maintenance information of the converter station;
the fault vector determining module is used for obtaining a plurality of groups of keywords based on the fault cases and the operation and maintenance corpus, and one group of keywords is used for representing one fault; determining a plurality of fault data vectors corresponding to the converter station based on the plurality of groups of keywords;
The fault vector labeling module is used for labeling each fault data vector to obtain a label type sequence corresponding to each fault data vector; wherein the tag type sequence includes at least two of a fault type tag, a fault feature tag, an inspection information tag, and a disposition measure tag; the disposal measure label is a binary label and comprises operation and maintenance role information and disposal information aiming at faults;
the model training module is used for taking the fault data vector and the mark type sequence as training data, training a text classification model based on the training data, and taking the trained text classification model as a fault strategy model; the fault policy model is used for identifying structured fault data information from unstructured fault information, wherein the structured fault data information corresponds to information in the mark type sequence.
10. A converter station fault policy pushing device, characterized in that the device comprises:
the similarity calculation module is used for determining the similarity between the current fault type and the fault type of each fault case in the standard case library after receiving the current fault type sent by the user terminal, so as to obtain a plurality of similarities;
The disposal measure pushing module is used for pushing the converter station fault disposal measure corresponding to the current fault type to the user terminal according to the plurality of similarities;
the standard case library is obtained by carrying out structural identification on a plurality of fault cases of the converter station through a fault strategy model; the fault policy model is trained by the method of any one of claims 1 to 5.
11. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 8 when the computer program is executed.
12. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 8.
13. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the steps of the method of any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310811621.7A CN117009516A (en) | 2023-07-04 | 2023-07-04 | Converter station fault strategy model training method, pushing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310811621.7A CN117009516A (en) | 2023-07-04 | 2023-07-04 | Converter station fault strategy model training method, pushing method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117009516A true CN117009516A (en) | 2023-11-07 |
Family
ID=88562806
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310811621.7A Pending CN117009516A (en) | 2023-07-04 | 2023-07-04 | Converter station fault strategy model training method, pushing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117009516A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117707819A (en) * | 2023-11-27 | 2024-03-15 | 苏州盖雅信息技术有限公司 | Rapid fault defining method based on TF-IDF weighted cosine similarity algorithm |
-
2023
- 2023-07-04 CN CN202310811621.7A patent/CN117009516A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117707819A (en) * | 2023-11-27 | 2024-03-15 | 苏州盖雅信息技术有限公司 | Rapid fault defining method based on TF-IDF weighted cosine similarity algorithm |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111581396B (en) | Event graph construction system and method based on multi-dimensional feature fusion and dependency syntax | |
CN111159407B (en) | Method, apparatus, device and medium for training entity recognition and relation classification model | |
CN112084337A (en) | Training method of text classification model, and text classification method and equipment | |
CN111738001B (en) | Training method of synonym recognition model, synonym determination method and equipment | |
CN112966074A (en) | Emotion analysis method and device, electronic equipment and storage medium | |
CN116097250A (en) | Layout aware multimodal pre-training for multimodal document understanding | |
CN113971210B (en) | Data dictionary generation method and device, electronic equipment and storage medium | |
CN117217277A (en) | Pre-training method, device, equipment, storage medium and product of language model | |
CN117009516A (en) | Converter station fault strategy model training method, pushing method and device | |
CN111831624A (en) | Data table creating method and device, computer equipment and storage medium | |
CN113392191B (en) | Text matching method and device based on multi-dimensional semantic joint learning | |
CN117709317A (en) | Report file processing method and device and electronic equipment | |
CN109902162B (en) | Text similarity identification method based on digital fingerprints, storage medium and device | |
WO2023134085A1 (en) | Question answer prediction method and prediction apparatus, electronic device, and storage medium | |
CN117216617A (en) | Text classification model training method, device, computer equipment and storage medium | |
CN116127087A (en) | Knowledge graph construction method and device, electronic equipment and storage medium | |
CN111199170B (en) | Formula file identification method and device, electronic equipment and storage medium | |
CN115203388A (en) | Machine reading understanding method and device, computer equipment and storage medium | |
CN114510561A (en) | Answer selection method, device, equipment and storage medium | |
CN112749251B (en) | Text processing method, device, computer equipment and storage medium | |
CN114325384A (en) | Crowdsourcing acquisition system and method based on motor fault knowledge | |
CN114610882A (en) | Abnormal equipment code detection method and system based on electric power short text classification | |
CN113761126A (en) | Text content identification method, text content identification device, text content identification equipment and readable storage medium | |
CN113821571A (en) | Food safety relation extraction method based on BERT and improved PCNN | |
CN113157892A (en) | User intention processing method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |