CN112632270A - Sequence labeling method for electric power plan text based on long-time memory network - Google Patents

Sequence labeling method for electric power plan text based on long-time memory network Download PDF

Info

Publication number
CN112632270A
CN112632270A CN201910909528.3A CN201910909528A CN112632270A CN 112632270 A CN112632270 A CN 112632270A CN 201910909528 A CN201910909528 A CN 201910909528A CN 112632270 A CN112632270 A CN 112632270A
Authority
CN
China
Prior art keywords
text
power plan
electric power
labeling
long
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910909528.3A
Other languages
Chinese (zh)
Inventor
杨群
周凯
刘绍翰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN201910909528.3A priority Critical patent/CN112632270A/en
Publication of CN112632270A publication Critical patent/CN112632270A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Water Supply & Treatment (AREA)
  • Tourism & Hospitality (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Primary Health Care (AREA)
  • Marketing (AREA)
  • Human Resources & Organizations (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a power plan text labeling technology based on a long-time memory neural network. The disclosed technology includes: the method comprises a data processing method for the electric power plan text, a professional dictionary construction method in the field of electric power plans, a word segmentation method for electric power plan text labeling, a long-time memory neural network structure for electric power plan text labeling and a training method. The data processing method for the electric power plan text is mainly used for processing the original plan text into samples which can be used for network training; the construction method of the professional dictionary in the field of the electric power plan is used for collecting professional words appearing in a training sample to form a reusable dictionary, and giving a corresponding identifier to each word in the dictionary; the word segmentation method for the electric power plan labeling is to perform word segmentation on the basis of a professional dictionary to avoid interference of common words; the invention also discloses a long-time memory neural network structure for the electric power plan text labeling, a training method and parameter setting.

Description

Sequence labeling method for electric power plan text based on long-time memory network
Technical Field
The invention relates to the field of natural language processing and deep learning, in particular to a text sequence labeling technology aiming at electric power plan analysis.
Background
The conventional network scheduling system depends on the subjective decision of a controller, is high-intensity mental labor, and has extremely high requirement on the reliability of the scheduler. Compared with a dispatcher, the computer has the advantages of high running speed, high real-time performance, large storage capacity and high reliability. In contrast, the dispatcher has the advantages of small working intensity, short duration, slow response speed, limited memory capacity, unstable efficiency and easy environmental influence on reliability. With the rapid development of the power system, the operation of the power grid dispatching system becomes tense, so that the working intensity of a dispatcher is increased, the working time is prolonged, the dispatcher is tired and mentally impatient, and an improper dispatching instruction is easily sent out. If the system can assist a dispatcher or even replace the dispatcher to manage through a machine, the performance and the reliability of the system can be effectively improved, and the system has great and urgent practical significance for reducing safety accidents.
The existing dispatching automation system deposits a great deal of alarm information, accident reports and equipment state information, and data information with different sources and different forms. At present, data analysis is carried out on dispatching automation systems around alarm information stored in a centralized mode, and the data analysis is usually carried out by adopting traditional artificial intelligence methods such as statistical analysis and genetic algorithm. The deep learning based on big data has made a great breakthrough in the field of artificial intelligence, and has made a great success in many fields such as speech recognition, natural language processing, computer vision, image and video analysis, and the like, and it becomes possible to analyze the incidence relation between the grid fault events from multiple angles and multiple granularities by adopting pattern recognition and machine learning algorithm and combining text clustering or text classification algorithm.
Disclosure of Invention
1. A power plan text labeling technology based on a long-time and short-time memory neural network is characterized in that the long-time and short-time memory neural network is used for carrying out sequence labeling on a power plan text, and the text labeling method comprises the following steps:
establishing a professional dictionary in the field of electric power plans;
a word segmentation method for labeling the text sequence of the electric power plan;
a data labeling method and a sample form for the electric power plan text;
a network structure and a training method are memorized according to the length of a power plan text sequence label.
2. The text labeling technique according to claim 1, wherein the application field is a power plan text, and the specificity of the power plan text that is difficult to process specifically comprises:
the electric power plan text contains a large number of professional words, including place names, substation names, line names, acronyms, and the like.
The syntax of the plan text is different from that of the daily wording except for a large number of professional words, and thus the effect of processing the plan text using a general-purpose tool is not ideal. Therefore, the invention provides a word segmentation method for a power plan text, and the method is used for carrying out word segmentation based on a power plan professional dictionary. The professional dictionary is established by firstly summarizing rules from a large number of power plans and then manually correcting by experts. The specialized lexicon is also constantly updated and expanded with each use.
3. The electric power plan text data annotation method and sample form of claim 1, specifically comprising:
the method classifies a plurality of application scenes related to the electric power plan, and labels different components for different scene plan texts. The scenario text is analyzed by the power expert to form ten types of scenes, and the text in each scene is marked as a limited component. The components herein are also referred to as slot names.
A special sample form is that a piece of protocol text will be processed as multiple lines of text, each line consisting of two parts, a word and a slot name, of the original text, each protocol being spaced apart by a space line.
4. The long-term and short-term memory neural network for labeling the text sequence of the power plan and the training method thereof according to claim 1, specifically comprising:
the long-time and short-time memory neural network can be used for labeling the text sequence of the power plan, and is a bidirectional long-time and short-time memory neural network;
the training method comprises parameter setting and iteration round number and accuracy analysis;
drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.
FIG. 1 is a flowchart of a training process of a long-term and short-term memory neural network model labeled according to a text sequence of an electric power plan in an embodiment of the present invention;
fig. 2 is a schematic diagram of a long-term and short-term memory neural network model labeled according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Fig. 1 is a flowchart of training a long-term and short-term memory neural network model for electric power plan text sequence annotation according to an embodiment of the present invention. The embodiment of the invention is implemented based on a deep learning toolkit. The general process is to process and train an LSTM-based deep neural network model for sequence labeling of power plan data for a given power plan data. The given data is first divided into training sets and test sets in a 4: 1 ratio. As shown in fig. 1, the training steps of the long-term and short-term memory neural network model labeled for the electric power plan text sequence include:
step 101: the step is mainly used for preparing priori knowledge in the field of the electric power plan, and mainly comprises the step of establishing a professional dictionary in the field of the electric power plan. The dictionary content is professional words integrated from a large amount of texts related to the power plan, the words rarely appear in the daily expressions, so that the existing word segmentation tools are difficult to process, and therefore the work is manually completed by a power expert by using some text processing tools.
Step 102: the step is to perform word segmentation processing on the electric power plan texts in the training set and the test set based on the electric power plan professional dictionary completed in the previous step. In the embodiment, the word segmentation tool is realized by selecting a jieba word segmentation packet in python, and the words which do not appear in the professional dictionary are processed according to the general word segmentation logic.
Step 103: the step mainly carries out training set data labeling work and mainly adopts manual labeling. And performing data annotation on the text which is segmented in the previous step, wherein each word in the data is endowed with an annotation which generally represents the component of the word in the text of the plan. The accuracy of labeling in the step directly influences the performance of the model, so that the importance is high, and the labeled result needs to be corrected by a professional.
Step 104: the step is mainly to maintain and update the electric power plan professional dictionary, and expand new words appearing in the data of the batch into the professional dictionary for the next use. This step requires a manual proofing process by an expert.
Step 105: the above steps are all data processing steps for preparing samples for training the neural network, and the step is used for training the neural network. The embodiment realizes a long-time and short-time memory neural network model based on the pytorch, and the input is 100-dimensional vectors and the output is N-dimensional vectors (N represents the total number of all types of labels). The training process is divided into 4 epochs each of 1000 iterations, and the GPU is used for training, and the time is in direct proportion to the training data.
Step 106: after the model training is finished, the model is evaluated by a test set in the step, and the evaluation criteria are the average accuracy of the labels and the accuracy of each type of labels. In this embodiment, the average accuracy rate reaches 90%.
Fig. 2 is a schematic diagram of a long-term and short-term memory neural network model labeled according to an embodiment of the present invention. As shown in fig. 2:
the input and output of the network model are respectively the sentence after word segmentation and the labeling sequence, and after the words are input, the words are firstly mapped into corresponding vectors and then network calculation is carried out. The LSTM of this embodiment is a bidirectional LSTM, which is more suitable for handling the problem of sequence-to-sequence, and besides, the problem of long-term dependence is also avoided by the smart design of the LSTM.

Claims (4)

1. A power plan text labeling technology based on a long-time and short-time memory neural network is characterized in that the long-time and short-time memory neural network is used for carrying out sequence labeling on a power plan text, and the text labeling method comprises the following steps:
establishing a professional dictionary in the field of electric power plans;
a word segmentation method for labeling the text sequence of the electric power plan;
a data labeling method and a sample form for the electric power plan text;
a network structure and a training method are memorized according to the length of a power plan text sequence label.
2. The text labeling technique according to claim 1, wherein the application field is a power plan text, and the specificity of the power plan text that is difficult to process specifically comprises:
the electric power plan text contains a large number of professional words, including place names, substation names, line names, acronyms, and the like.
The syntax of the plan text is different from that of the daily wording except for a large number of professional words, and thus the effect of processing the plan text using a general-purpose tool is not ideal. Therefore, the invention provides a word segmentation method for a power plan text, and the method is used for carrying out word segmentation based on a power plan professional dictionary. The professional dictionary is established by firstly summarizing rules from a large number of power plans and then manually correcting by experts. The specialized lexicon is also constantly updated and expanded with each use.
3. The electric power plan text data annotation method and sample form of claim 1, specifically comprising:
the method classifies a plurality of application scenes related to the electric power plan, and labels different components for different scene plan texts. The scenario text is analyzed by the power expert to form ten types of scenes, and the text in each scene is marked as a limited component. The components herein are also referred to as slot names.
A special sample form is that a piece of protocol text will be processed as multiple lines of text, each line consisting of two parts, a word and a slot name, of the original text, each protocol being spaced apart by a space line.
4. The long-term and short-term memory neural network for labeling the text sequence of the power plan and the training method thereof according to claim 1, specifically comprising:
the long-time and short-time memory neural network can be used for labeling the text sequence of the power plan, and is a bidirectional long-time and short-time memory neural network;
the training method comprises parameter setting and iteration round number and accuracy analysis.
CN201910909528.3A 2019-09-23 2019-09-23 Sequence labeling method for electric power plan text based on long-time memory network Pending CN112632270A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910909528.3A CN112632270A (en) 2019-09-23 2019-09-23 Sequence labeling method for electric power plan text based on long-time memory network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910909528.3A CN112632270A (en) 2019-09-23 2019-09-23 Sequence labeling method for electric power plan text based on long-time memory network

Publications (1)

Publication Number Publication Date
CN112632270A true CN112632270A (en) 2021-04-09

Family

ID=75283122

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910909528.3A Pending CN112632270A (en) 2019-09-23 2019-09-23 Sequence labeling method for electric power plan text based on long-time memory network

Country Status (1)

Country Link
CN (1) CN112632270A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170323636A1 (en) * 2016-05-05 2017-11-09 Conduent Business Services, Llc Semantic parsing using deep neural networks for predicting canonical forms
CN110232192A (en) * 2019-06-19 2019-09-13 中国电力科学研究院有限公司 Electric power term names entity recognition method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170323636A1 (en) * 2016-05-05 2017-11-09 Conduent Business Services, Llc Semantic parsing using deep neural networks for predicting canonical forms
CN110232192A (en) * 2019-06-19 2019-09-13 中国电力科学研究院有限公司 Electric power term names entity recognition method and device

Similar Documents

Publication Publication Date Title
CN110277086B (en) Voice synthesis method and system based on power grid dispatching knowledge graph and electronic equipment
WO2020140386A1 (en) Textcnn-based knowledge extraction method and apparatus, and computer device and storage medium
CN110765759B (en) Intention recognition method and device
CN110968660B (en) Information extraction method and system based on joint training model
CN110866093A (en) Machine question-answering method and device
CN112599124A (en) Voice scheduling method and system for power grid scheduling
CN110110095A (en) A kind of power command text matching technique based on shot and long term memory Recognition with Recurrent Neural Network
CN105653620B (en) Log analysis method and device of intelligent question-answering system
CN108304890A (en) A kind of generation method and device of disaggregated model
CN113095050A (en) Intelligent ticketing method, system, equipment and storage medium
CN116010581A (en) Knowledge graph question-answering method and system based on power grid hidden trouble shooting scene
CN115510180A (en) Multi-field-oriented complex event element extraction method
CN112036179B (en) Electric power plan information extraction method based on text classification and semantic frame
CN107967304A (en) Session interaction processing method, device and electronic equipment
CN116991875B (en) SQL sentence generation and alias mapping method and device based on big model
CN113792538A (en) Method and device for quickly generating operation ticket of power distribution network
CN113312924A (en) Risk rule classification method and device based on NLP high-precision analysis label
CN113065352B (en) Method for identifying operation content of power grid dispatching work text
CN112507117A (en) Deep learning-based maintenance suggestion automatic classification method and system
CN112632270A (en) Sequence labeling method for electric power plan text based on long-time memory network
CN115438190B (en) Power distribution network fault auxiliary decision knowledge extraction method and system
CN110782221A (en) Intelligent interview evaluation system and method
CN110929516A (en) Text emotion analysis method and device, electronic equipment and readable storage medium
CN113377962B (en) Intelligent process simulation method based on image recognition and natural language processing
CN111460160B (en) Event clustering method of stream text data based on reinforcement learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination