CN113822599A - Power industry policy management method based on classification tree fusion technology - Google Patents

Power industry policy management method based on classification tree fusion technology Download PDF

Info

Publication number
CN113822599A
CN113822599A CN202111256627.XA CN202111256627A CN113822599A CN 113822599 A CN113822599 A CN 113822599A CN 202111256627 A CN202111256627 A CN 202111256627A CN 113822599 A CN113822599 A CN 113822599A
Authority
CN
China
Prior art keywords
power industry
information
industry policy
word
policy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111256627.XA
Other languages
Chinese (zh)
Inventor
朱峰
左强
邹云峰
祝宇楠
范环宇
蔡明明
寇文心
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Jiangsu Electric Power Co ltd Marketing Service Center
State Grid Jiangsu Electric Power Co Ltd
Original Assignee
State Grid Jiangsu Electric Power Co ltd Marketing Service Center
State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Jiangsu Electric Power Co ltd Marketing Service Center, State Grid Jiangsu Electric Power Co Ltd filed Critical State Grid Jiangsu Electric Power Co ltd Marketing Service Center
Priority to CN202111256627.XA priority Critical patent/CN113822599A/en
Publication of CN113822599A publication Critical patent/CN113822599A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0637Strategic management or analysis, e.g. setting a goal or target of an organisation; Planning actions based on goals; Analysis or evaluation of effectiveness of goals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/06Energy or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02EREDUCTION OF GREENHOUSE GAS [GHG] EMISSIONS, RELATED TO ENERGY GENERATION, TRANSMISSION OR DISTRIBUTION
    • Y02E40/00Technologies for an efficient electrical power generation, transmission or distribution
    • Y02E40/70Smart grids as climate change mitigation technology in the energy generation sector
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • Economics (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computing Systems (AREA)
  • Strategic Management (AREA)
  • Evolutionary Computation (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • Probability & Statistics with Applications (AREA)
  • Primary Health Care (AREA)
  • Water Supply & Treatment (AREA)
  • Public Health (AREA)
  • Development Economics (AREA)
  • Databases & Information Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The application discloses a power industry policy management method based on classification tree fusion technology, which comprises the following steps: acquiring a policy text of the power industry and preprocessing data; encoding the power industry policy text information after data preprocessing; setting information attention weights of different sentences in a policy text of the power industry; classifying the electric power industry policy texts based on the codes and attention weights of the electric power industry policy text information; extracting information of different types of power industry policy texts; and the extracted different types of information are fused and assembled to realize the policy management of the power industry. The method is used for managing the policies of the power industry based on the classification tree fusion technology, can meet the requirements of digitalization, intelligent transformation and efficient and unified management of the policies of the power industry, realizes classification management of the policies of the power industry, improves the management efficiency of the policies of the power industry, and supports quality improvement and efficiency improvement of related services of the power industry.

Description

Power industry policy management method based on classification tree fusion technology
Technical Field
The invention belongs to the technical field of information perception and identification of the power industry, and relates to a power industry policy management method based on a classification tree fusion technology.
Background
The power industry is the basic industry, the pillar industry and the strategic industry of national economy, and the development of industries such as power informatization, smart grid and power internet of things is an important means for realizing energy production, consumption, technology and system revolution in China.
The difference between the policy of the power industry and the general policy is that the function of the policy of the power industry is more complex, the policy is an important component of the national economic system, and the policy is an important means for national economic adjustment. The power industry policy is of various types and complex, for example, the electricity price policy is an important means for national economic regulation, and the electricity prices of different types are often adjusted according to the change of economic policies at different periods. Therefore, the policy execution of the power industry is in place and accurate, the loss of the legal benefits of the enterprise is avoided, the power utilization accuracy of the power utilization customers is ensured, and the policy execution method is a key work of the power enterprise.
At present, natural language processing technology is gradually mature, but even under the background of vigorously advocating intellectualization and digital transformation in the power industry, the application of natural language processing technology in the power industry, especially in the field of policy management of the power industry, is still lacking.
Therefore, considering the digital and intelligent transformation requirements of the power industry and the requirement of unified management of the policy of the power industry comprehensively, a technical method for efficiently managing the policy of the power industry based on natural language processing is urgently needed, so as to support the management and implementation of the policy of the power industry.
Disclosure of Invention
In order to overcome the defects in the prior art, the power industry policy management method based on the classification tree fusion technology is provided.
In order to achieve the above purpose, the invention adopts the following technical scheme:
a power industry policy management method based on classification tree fusion technology comprises the following steps:
step 1: acquiring a policy text of the power industry and preprocessing data;
step 2: encoding the power industry policy text information after data preprocessing;
and step 3: setting information attention weights of different sentences in a policy text of the power industry;
and 4, step 4: based on the coding and attention weight of the electric power industry policy text information, classifying the electric power industry policy text by adopting a classification tree fusion technology;
and 5: extracting the triple information of the policy texts of different types of the power industry;
step 6: and based on an entity alignment algorithm, the extracted different types of information are fused and assembled, so that the policy management of the power industry is realized.
The invention further comprises the following preferred embodiments:
preferably, step 1 specifically comprises:
step 1.1: the method comprises the steps of obtaining a power industry policy text, using a jieba word segmentation tool to segment the power industry policy text, and deleting stop words in the power industry policy text through a stop word vocabulary;
step 1.2: after the preprocessing of the step 1.1, each word in the sentence obtains the position of the word in the word list through the word list and maps each word into a word vector in the word embedding matrix through the word embedding matrix;
step 1.3: based on the word vectors, the convolutional neural network extracts information representations of statements in the power industry policy text.
Preferably, step 1.3 specifically comprises:
step 1.3.1: performing matrix splicing combination on word vectors of each word in the electric power industry policy text sentences to construct a sentence vector matrix of the electric power industry policy text sentences;
step 1.3.2: aiming at the sentence vector matrix, a plurality of convolution kernels with different sizes are arranged in the convolution layer to extract the common information representation among different words;
step 1.3.3: and extracting fixed-length information representation from statements with different lengths by using a K-Max pooling and Padding method.
Preferably, step 2 is specifically:
and sequentially inputting the information representation of each statement into a BilSTM network or a GRU network according to the sequence of the statements in the electric power industry policy text, and coding the electric power industry policy text information.
Preferably, step 3 is specifically:
and setting information Attention weights of different sentences in the electric power industry policy text by using an Attention mechanism, and outputting an electric power industry policy text vector code added with the Attention weight information.
Preferably, step 4 is specifically:
and coding the electric power industry policy text vector added with the attention weight information into a Softmax classifier to obtain a one-hot vector representation of the category to which the electric power industry policy text belongs, and finally realizing electric power industry policy text classification.
Preferably, step 5 specifically includes:
step 5.1: based on an Open domain three-tuple extraction tool Open-IE, extracting triple information of a policy text in the power industry: firstly extracting all possible subjects and predicates of different types of power industry policies, then judging the association between the subjects and predicates, and finally extracting the subjects corresponding to the subjects and predicates;
step 5.2: and (3) extracting the triple information of the policy text of the power industry based on a closed domain triple extraction tool Close-IE: extract the Object and the Object first, and then classify the relationship between the Object and the Object.
Preferably, step 5.1 specifically comprises:
step 5.1.1: an encoding Layer Encoder-Layer acquires context information of a statement;
step 5.1.2: the entity extraction Layer EntityRelation-Layer extracts all possible objects and predicates;
step 5.1.3: finding all possible related subjects and predicates by the Multihead-Layer;
step 5.1.4: extracting the corresponding Object by the Object-Layer according to the specified Object and Predicate;
step 5.1.5: Triple-Result extracts the final (Subject, predict, Object) set in the statement according to steps 5.1.1-5.1.4.
Preferably, in step 5.1.2, the start position and the end position of Subject and Predicate are extracted respectively in Span mode, and the formula is as follows:
Pi start_s=sigmoid(Wstarthi+bstart)
Pi end_s=sigmoid(Wendhi+bend)
Pi start_p=sigmoid(Wstarthi+bstart)
Pi end_p=sigmoid(Wendhi+bend)
wherein P isi start_sRepresenting the probability that the ith token is the start position of Subject in the sentence, Pi end_sRepresenting the probability that the ith token is the end position of Subject in the sentence, Pi start_pRepresenting the probability that the ith token in the statement is the beginning of the Predicate, Pi end_pIndicates the probability that the ith token in the statement is the ending position of Predicate, hiRepresenting the coding after the ith token in the sentence by Bert, W(·)Representing the weight of the model to be trained, b(·)Is a partial execution;
step 5.1.3 the formula used is as follows:
Pi,j=sigmoid(hi,hj)
wherein h isiRepresents the coding of the ith feature in the sentence, the feature represented as Subject, hjThe encoding of the jth feature in the statement, representing the feature of Predicate, Pi,jIs represented by (h)i,hj) Probabilities that relationships can be constructed;
step 5.1.4 the formula used is as follows:
Pi start_o=sigmoid(Wstart_o(hi,Vs,Vp)+bstart_o)
Pi end_o=sigmoid(Wend_o(hi,Vs,Vp)+bend_o)
wherein P isi start_oRepresenting the probability that the ith token is the start position of the Object in the sentence, Pi end_oDenotes the probability that the ith token is the end position of the Object in the sentence, VsDenotes the sum of the head and tail features, V, representing SubjectpRepresents the sum of head and tail characteristics of Predicate.
Preferably, step 5.2 specifically comprises:
step 5.2.1: a BERT coding Layer BERT-Layer acquires context information of a statement;
step 5.2.2: the Entity extraction Layer Entity-Layer extracts all possible Subjects and Obubjects;
step 5.2.3: the Multihead-Layer finds out the possible relation among all different tokens in the statement;
step 5.2.4: Triple-Result extracts the final (Subject, predictor, Object) set in the statement according to steps 5.2.1-5.2.3.
The beneficial effect that this application reached:
the invention is based on a classification tree fusion technology and provides a novel open domain three-tuple information extraction mode, realizes the policy management of the power industry, can meet the requirements of digitalization, intelligent transformation and efficient and unified management of the policy of the power industry, realizes the classification management of the policy of the power industry, improves the management efficiency of the policy of the power industry, and supports the quality improvement and the efficiency improvement of related services of the power industry.
Drawings
FIG. 1 is a flowchart of a power industry policy management method based on classification tree fusion technology according to the present invention;
FIG. 2 is a diagram of a power industry policy text classifier according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating extraction of Open-IE Open domain information in an embodiment of the present invention;
FIG. 4 is a schematic diagram illustrating extraction of Close-IE open domain information in the embodiment of the present invention.
Detailed Description
The present application is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present application is not limited thereby.
As shown in fig. 1, the method for managing policies in the power industry based on the classification tree fusion technology of the present invention includes the following steps:
step 1: the method includes the steps of obtaining a policy text of the power industry and conducting data preprocessing, and specifically includes the following steps:
step 1.1: acquiring a policy text of the power industry and preprocessing data:
the method comprises the steps of obtaining a power industry policy text, using a jieba word segmentation tool to segment the power industry policy text, and deleting stop words in the power industry policy text through a stop word vocabulary;
such as: for the phrase "the fire power and electricity price of Jiangsu province of this year is predicted to be adjusted", the phrases "this year", "Jiangsu province", "fire", "electricity price", "prediction", "will", "adjusted" will be obtained by word segmentation and removal of stop words.
Step 1.2: word embedding:
as shown in a word embedding module in fig. 2, a plurality of sentences in a power industry policy document are sequentially input into a word embedding layer, each word in the sentence obtains the position of the word in the word list through the word list, and each word is mapped into a word vector in a word embedding matrix through the word embedding matrix;
the vocabulary is a set of all possible words, and the location of a word in the vocabulary can be found, for example, the location of the word in the vocabulary of "this year" is 156.
The word embedding matrix is a two-dimensional matrix with the dimensionality [ the size of a word list and the length of a word vector ], and the word vector corresponding to the input word can be obtained through the word embedding matrix at the position of the word list.
The word vector is a vector with a fixed size, and words in each word list correspond to different word vectors.
Step 1.3: extracting statement characterization information by the convolutional neural network:
as shown in the CNN module in fig. 2, extracting information representations of statements in a power industry policy text by using a convolutional neural network specifically includes:
step 1.3.1: performing matrix splicing combination on word vectors of each word in the electric power industry policy text sentences to construct a sentence vector matrix of the electric power industry policy text sentences;
the sentence "the price of electricity and heat in Jiangsu province of this year is expected to be adjusted" is also used as an example.
Through step 1.2, word vectors of 7 words contained in the sentence can be obtained, and then the 7 word vectors are spliced and combined to obtain a sentence vector matrix of the sentence.
The sentence vector matrix is a two-dimensional matrix with the dimension [ sentence length, word vector dimension ].
Step 1.3.2: setting a plurality of convolution kernels with different sizes in the convolution layer to extract the common information representation among different words;
for the sentence vector matrix, 5 convolution kernels with the dimensions of [1, word vector dimension ], [2, word vector dimension ], [3, word vector dimension ], [4, word vector dimension ], [5, word vector dimension ] are arranged, and the number of each convolution kernel is 5 respectively. Taking a convolution kernel with a dimension of [3, word vector dimension ] as an example, the convolution kernel can extract information representation among 3 words, so that more information among the words can be mined by setting a plurality of convolution kernels with different sizes.
Step 1.3.3: and extracting fixed-length information representations from statements with different lengths in the pooling layer through a K-Max pooling layer and Padding.
For example, for two sentences of which the sentence vector matrix is [7, word vector dimension ] (7 is the length of the sentence after segmentation) and [18,768] (18 is the length of the sentence after segmentation);
after K-Max pooling and Padding, the two sentence vector matrixes can be compressed into 2 sentence vectors with the same dimension, such as sentence vectors with the dimension of [1,200 ].
Step 2: encoding the electric power industry policy text information after data preprocessing, specifically comprising the following steps:
the BilSTM network or the GRU network encodes the text information of the policy of the power industry:
the BilSTM network or GRU network belongs to sequence modeling, where each sequence unit of the BilSTM network outputs its hidden state h.
Assuming that a text describing the thermal power generation policy is encoded with information, a maximum of 50 sentences in the document are assumed, and each sentence obtains a sentence vector (with the length of 200) of the sentence through a convolution network.
These 50 sentence vectors of length 200 are then sequence modeled by BiLSTM, outputting a vector representation of all sequence units.
The vector of the output has a specific dimension size of [50,200 ]. (wherein 50 represents the number of sequence units of the BilSTM network, and 200 represents the length of the output vector of each sequence unit of the BilSTM network)
As shown in the BiLSTM network in fig. 2, the BiLSTM network can effectively capture the information dependency relationship of a longer distance, and therefore, after the information representation of the fixed length of the statement is extracted from each statement in the electric power industry policy text, the information representation of each statement is sequentially input into the BiLSTM network according to the sequence of the statements in the electric power industry policy text, and the electric power industry policy text information is encoded.
The BilSTM network or the GRU network can output codes without pre-training and inputting the representation.
And step 3: setting information attention weights of different sentences in a policy text of the power industry, specifically comprising the following steps:
as shown in the Attention mechanism in fig. 2, the Attention mechanism is used to set information Attention weights of different sentences in the power industry policy text, and output a power industry policy text vector code added with the Attention weight information.
The input of the Attention is the output of the BilsTM network, and in the Attention mechanism, the output h of the BilsTM network is firstly outputi,tInputting the full connection layer to obtain an implicit layer representation u of the attention layeri,t
Then the attention weight alpha of the corresponding information in the document is calculated through softmaxi,t
Finally, the weighted sum is carried out on the weight and the output of the BilSTM network to obtain the vector representation s after statement weightingi
ui,t=tanh(Wwhi,t+bw)
Figure BDA0003324078200000071
Figure BDA0003324078200000072
And 4, step 4: based on the coding and attention weight of the text information of the electric power industry policy, the classification of the text of the electric power industry policy is carried out by adopting a classification tree fusion technology, which specifically comprises the following steps:
classifying the electric power industry policy texts by a Softmax classifier:
as shown in a Softmax classifier module in fig. 2, encoding and inputting the power industry policy text vector added with the attention weight information into the Softmax classifier to obtain a one-hot vector representation of a category to which the power industry policy text belongs, and finally realizing the text classification of the power industry policy.
And 5: extracting the triple information of the policy texts of different types of the power industry, specifically comprising the following steps:
step 5.1: extracting triple information of a policy text of the power industry based on an Open domain triple extraction tool Open-IE, wherein the specific structure is shown in FIG. 3;
due to the current lack of a correlation method for open domain triplet information extraction.
Therefore, a new method for extracting the open domain triplet information is proposed.
The method comprises the steps of firstly extracting all possible subjects and predicates of different types of power industry policies, then judging the association between the subjects and the predicates, and finally extracting the subjects corresponding to the subjects and the predicates.
The step 5.1 specifically comprises the following steps:
step 5.1.1: Encode-Layer:
the text representation capability of the BilSTM network on the triple extraction task is weak, and the overall effect is poor. Therefore, different from the step 2 of selecting the BilSTM network as the coding layer, the triple extraction task selects BERT as the coding layer, so that the context information of the statement can be better acquired.
In the Encoder-Layer of fig. 3, in order to further improve the model performance, BERT is used as a feature extraction Layer, so as to better acquire context information of a statement.
Step 5.1.2EntityRelation-Layer:
in the entity extraction Layer subpar-Layer of fig. 3, the start position and the end position of the Subject and the Predicate are respectively extracted in a Span manner. The calculation formula is as follows:
Pi start_s=sigmoid(Wstarthi+bstart)
Pi end_s=sigmoid(Wendhi+bend)
Pi start_p=sigmoid(Wstarthi+bstart)
Pi end_p=sigmoid(Wendhi+bend)
wherein P isi start_sRepresenting the probability that the ith token is the start position of Subject in the sentence, Pi end_sRepresenting the probability that the ith token is the end position of Subject in the sentence, Pi start_pRepresenting the probability that the ith token in the statement is the beginning of the Predicate, Pi end_pIndicates the probability that the ith token in the statement is the ending position of Predicate, hiRepresenting the coding after the ith token in the sentence by Bert, W(·)Representing the weight of the model to be trained, b(·)It is a bias.
Step 5.1.3: Multihead-Layer:
in the MultiHead-Layer of fig. 3, each token in the statement may possibly form a relationship with other tokens, and the Layer will find out the Subject and the prefix of all possible relationships, and the calculation formula is as follows:
Pi,j=sigmoid(hi,hj)
wherein h isiRepresents the coding of the ith feature in the sentence, the feature represented as Subject, hjThe encoding of the jth feature in the statement, representing the feature of Predicate, Pi,jIs represented by (h)i,hj) The probabilities of the relationships may be constructed.
Step 5.1.4: Object-Layer:
in the Object-Layer in fig. 3, the Layer is used to extract the specified Object, and extract the corresponding Object according to the specified Object and Predicate, and the calculation formula is as follows:
Pi start_o=sigmoid(Wstart_o(hi,Vs,Vp)+bstart_o)
Pi end_o=sigmoid(Wend_o(hi,Vs,Vp)+bend_o)
wherein, Pi start_oRepresenting the probability that the ith token is the start position of the Object in the sentence, Pi end_oDenotes the probability that the ith token is the end position of the Object in the sentence, VsDenotes the sum of the head and tail features, V, representing SubjectpRepresents the sum of head and tail characteristics of Predicate.
Step 5.1.5: Triple-Result:
in the Triple-Result layer of FIG. 3, the Triple-Result layer extracts the final (Subject, predictor, Object) set of the statement according to the first several steps.
Step 5.2: and extracting the triple information of the policy text of the power industry based on a closed domain triple extraction tool Close-IE, wherein the specific structure is shown in FIG. 4.
Step 5.2, extract Object and Object first, and then classify the relationship between Object and Object, which specifically includes:
step 5.2.1: BERT-Layer:
and (3) extracting the three-tuple information in the closed domain also because the text representation capability of the BilSTM network on the task of extracting the three-tuple is weak, and BERT is selected as an encoding layer.
In the coding Layer BERT Layer of fig. 4, BERT is used as a feature extraction Layer to obtain context information of a statement.
Step 5.2.2: Entity-Layer:
in the Entity extraction Layer Entity Layer of fig. 4, the start position and the end position of the Subject and the object are extracted respectively in a Span manner. The calculation formula is as follows:
Pi start_s=sigmoid(Wstarthi+bstart)
Pi end_s=sigmoid(Wendhi+bend)
Pi start_o=sigmoid(Wstarthi+bstart)
Pi end_o=sigmoid(Wendhi+bend)
wherein P isi start_sRepresenting the probability that the ith token is the start position of Subject in the sentence, Pi end_sRepresenting the probability that the ith token is the end position of Subject in the sentence, Pi start_oRepresenting the probability that the ith token is the start position of the object in the sentence, Pi end_oIndicates the probability that the ith token is the end position of the object in the sentence, hiRepresenting the coding after the ith token in the sentence by Bert, W(·)Representing the weight of the model to be trained, b(·)It is a bias.
Step 5.2.3: Multihead-Layer:
in the MultiHead-Layer of fig. 4, each token in the statement may have a relationship with other tokens, and the Layer finds the possible relationship between all different tokens, and the calculation formula is as follows:
Pi,j=sigmoid(hi,hj)
wherein h isiRepresents the coding of the ith feature in the sentence, the feature represented as Subject, hjCoding of jth feature in a sentence, representing the feature of Object, Pi,jIs represented by (h)i,hj) The probabilities of the relationships may be constructed.
Step 5.2.4: Triple-Result:
in the Triple-Result of FIG. 4, the final (Subject, predictor, Object) set in the statement is extracted according to the first few steps.
Step 6: and based on an entity alignment algorithm, the extracted different types of information are fused and assembled to realize the policy management of the power industry, so that power industry policy trees of different classifications are formed.
The present applicant has described and illustrated embodiments of the present invention in detail with reference to the accompanying drawings, but it should be understood by those skilled in the art that the above embodiments are merely preferred embodiments of the present invention, and the detailed description is only for the purpose of helping the reader to better understand the spirit of the present invention, and not for limiting the scope of the present invention, and on the contrary, any improvement or modification made based on the spirit of the present invention should fall within the scope of the present invention.

Claims (10)

1. A power industry policy management method based on classification tree fusion technology is characterized by comprising the following steps:
the method comprises the following steps:
step 1: acquiring a policy text of the power industry and preprocessing data;
step 2: encoding the power industry policy text information after data preprocessing;
and step 3: setting information attention weights of different sentences in a policy text of the power industry;
and 4, step 4: based on the coding and attention weight of the electric power industry policy text information, classifying the electric power industry policy text by adopting a classification tree fusion technology;
and 5: extracting the triple information of the policy texts of different types of the power industry;
step 6: and based on an entity alignment algorithm, the extracted different types of information are fused and assembled, so that the policy management of the power industry is realized.
2. The power industry policy management method based on classification tree fusion technology as claimed in claim 1, wherein:
the step 1 specifically comprises the following steps:
step 1.1: the method comprises the steps of obtaining a power industry policy text, using a jieba word segmentation tool to segment the power industry policy text, and deleting stop words in the power industry policy text through a stop word vocabulary;
step 1.2: after the preprocessing of the step 1.1, each word in the sentence obtains the position of the word in the word list through the word list and maps each word into a word vector in the word embedding matrix through the word embedding matrix;
step 1.3: based on the word vectors, the convolutional neural network extracts information representations of statements in the power industry policy text.
3. The power industry policy management method based on classification tree fusion technology as claimed in claim 2, wherein:
the step 1.3 specifically comprises:
step 1.3.1: performing matrix splicing combination on word vectors of each word in the electric power industry policy text sentences to construct a sentence vector matrix of the electric power industry policy text sentences;
step 1.3.2: aiming at the sentence vector matrix, a plurality of convolution kernels with different sizes are arranged in the convolution layer to extract the common information representation among different words;
step 1.3.3: and extracting fixed-length information representation from statements with different lengths by using a K-Max pooling and Padding method.
4. The power industry policy management method based on classification tree fusion technology as claimed in claim 2, wherein:
the step 2 specifically comprises the following steps:
and sequentially inputting the information representation of each statement into a BilSTM network or a GRU network according to the sequence of the statements in the electric power industry policy text, and coding the electric power industry policy text information.
5. The electric power industry policy management method based on classification tree fusion technology as claimed in claim 4, wherein:
the step 3 specifically comprises the following steps:
and setting information Attention weights of different sentences in the electric power industry policy text by using an Attention mechanism, and outputting an electric power industry policy text vector code added with the Attention weight information.
6. The electric power industry policy management method based on classification tree fusion technology as claimed in claim 5, wherein:
the step 4 specifically comprises the following steps:
and coding the electric power industry policy text vector added with the attention weight information into a Softmax classifier to obtain a one-hot vector representation of the category to which the electric power industry policy text belongs, and finally realizing electric power industry policy text classification.
7. The power industry policy management method based on classification tree fusion technology as claimed in claim 1, wherein:
the step 5 specifically comprises the following steps:
step 5.1: based on an Open domain three-tuple extraction tool Open-IE, extracting triple information of a policy text in the power industry: firstly extracting all possible subjects and predicates of different types of power industry policies, then judging the association between the subjects and predicates, and finally extracting the subjects corresponding to the subjects and predicates;
step 5.2: and (3) extracting the triple information of the policy text of the power industry based on a closed domain triple extraction tool Close-IE: extract the Object and the Object first, and then classify the relationship between the Object and the Object.
8. The power industry policy management method based on classification tree fusion technology as claimed in claim 7, wherein:
the step 5.1 specifically comprises the following steps:
step 5.1.1: an encoding Layer Encoder-Layer acquires context information of a statement;
step 5.1.2: the entity extraction Layer EntityRelation-Layer extracts all possible objects and predicates;
step 5.1.3: finding all possible related subjects and predicates by the Multihead-Layer;
step 5.1.4: extracting the corresponding Object by the Object-Layer according to the specified Object and Predicate;
step 5.1.5: Triple-Result extracts the final (Subject, predict, Object) set in the statement according to steps 5.1.1-5.1.4.
9. The power industry policy management method based on classification tree fusion technology as claimed in claim 8, wherein:
in step 5.1.2, the start position and the end position of the Subject and the Predicate are extracted respectively in a Span mode, and the formula is as follows:
Pi start_s=sigmoid(Wstarthi+bstart)
Pi end_s=sigmoid(Wendhi+bend)
Pi start_p=sigmoid(Wstarthi+bstart)
Pi end_p=sigmoid(Wendhi+bend)
wherein P isi start_sRepresenting the probability that the ith token is the start position of Subject in the sentence, Pi end_sRepresenting the probability that the ith token is the end position of Subject in the sentence, Pi start_pRepresenting the probability that the ith token in the statement is the beginning of the Predicate, Pi end_pIndicates the probability that the ith token in the statement is the ending position of Predicate, hiRepresenting the coding after the ith token in the sentence by Bert, W(·)Representing the weight of the model to be trained, b(·)Is a partial execution;
step 5.1.3 the formula used is as follows:
Pi,j=sigmoid(hi,hj)
wherein h isiRepresents the coding of the ith feature in the sentence, the feature represented as Subject, hjThe encoding of the jth feature in the statement, representing the feature of Predicate, Pi,jIs represented by (h)i,hj) Probabilities that relationships can be constructed;
step 5.1.4 the formula used is as follows:
Pi start_o=sigmoid(Wstart_o(hi,Vs,Vp)+bstart_o)
Pi end_o=sigmoid(Wend_o(hi,Vs,Vp)+bend_o)
wherein P isi start_oRepresenting the probability that the ith token is the start position of the Object in the sentence, Pi end_oDenotes the probability that the ith token is the end position of the Object in the sentence, VsDenotes the sum of the head and tail features, V, representing SubjectpRepresents the sum of head and tail characteristics of Predicate.
10. The power industry policy management method based on classification tree fusion technology as claimed in claim 7, wherein:
the step 5.2 specifically comprises the following steps:
step 5.2.1: a BERT coding Layer BERT-Layer acquires context information of a statement;
step 5.2.2: the Entity extraction Layer Entity-Layer extracts all possible Subjects and Obubjects;
step 5.2.3: the Multihead-Layer finds out the possible relation among all different tokens in the statement;
step 5.2.4: Triple-Result extracts the final (Subject, predictor, Object) set in the statement according to steps 5.2.1-5.2.3.
CN202111256627.XA 2021-10-27 2021-10-27 Power industry policy management method based on classification tree fusion technology Pending CN113822599A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111256627.XA CN113822599A (en) 2021-10-27 2021-10-27 Power industry policy management method based on classification tree fusion technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111256627.XA CN113822599A (en) 2021-10-27 2021-10-27 Power industry policy management method based on classification tree fusion technology

Publications (1)

Publication Number Publication Date
CN113822599A true CN113822599A (en) 2021-12-21

Family

ID=78918927

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111256627.XA Pending CN113822599A (en) 2021-10-27 2021-10-27 Power industry policy management method based on classification tree fusion technology

Country Status (1)

Country Link
CN (1) CN113822599A (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083700A (en) * 2019-03-19 2019-08-02 北京中兴通网络科技股份有限公司 A kind of enterprise's public sentiment sensibility classification method and system based on convolutional neural networks
CN111767741A (en) * 2020-06-30 2020-10-13 福建农林大学 Text emotion analysis method based on deep learning and TFIDF algorithm
CN112100397A (en) * 2020-09-07 2020-12-18 南京航空航天大学 Electric power plan knowledge graph construction method and system based on bidirectional gating circulation unit
CN112199491A (en) * 2020-10-14 2021-01-08 中国科学院计算技术研究所厦门数据智能研究院 Method for extracting relational five-tuple based on BERT and priori knowledge characteristics
CN112560475A (en) * 2020-11-16 2021-03-26 和美(深圳)信息技术股份有限公司 Triple extraction method and system
CN112613315A (en) * 2020-12-29 2021-04-06 重庆农村商业银行股份有限公司 Text knowledge automatic extraction method, device, equipment and storage medium
EP3839818A2 (en) * 2020-09-29 2021-06-23 Beijing Baidu Netcom Science And Technology Co. Ltd. Method and apparatus for performing structured extraction of text, device and storage medium
CN113312917A (en) * 2021-05-28 2021-08-27 国网江苏省电力有限公司电力科学研究院 Entity relation extraction method and system based on knowledge reasoning

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083700A (en) * 2019-03-19 2019-08-02 北京中兴通网络科技股份有限公司 A kind of enterprise's public sentiment sensibility classification method and system based on convolutional neural networks
CN111767741A (en) * 2020-06-30 2020-10-13 福建农林大学 Text emotion analysis method based on deep learning and TFIDF algorithm
CN112100397A (en) * 2020-09-07 2020-12-18 南京航空航天大学 Electric power plan knowledge graph construction method and system based on bidirectional gating circulation unit
EP3839818A2 (en) * 2020-09-29 2021-06-23 Beijing Baidu Netcom Science And Technology Co. Ltd. Method and apparatus for performing structured extraction of text, device and storage medium
CN112199491A (en) * 2020-10-14 2021-01-08 中国科学院计算技术研究所厦门数据智能研究院 Method for extracting relational five-tuple based on BERT and priori knowledge characteristics
CN112560475A (en) * 2020-11-16 2021-03-26 和美(深圳)信息技术股份有限公司 Triple extraction method and system
CN112613315A (en) * 2020-12-29 2021-04-06 重庆农村商业银行股份有限公司 Text knowledge automatic extraction method, device, equipment and storage medium
CN113312917A (en) * 2021-05-28 2021-08-27 国网江苏省电力有限公司电力科学研究院 Entity relation extraction method and system based on knowledge reasoning

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SUNCONG ZHENG ETAL.: "Joint Extraction of Entities and Relations Based on a Novel Tagging Scheme", 《PROCEEDINGS OF THE TWENTY-NINTH INTERNATIONAL JOINT CONFERENCE ON ARTIFICIAL INTELLIGENCE (IJCAI-20)》, 31 December 2017 (2017-12-31), pages 1227 - 1236 *
张龙辉 等: "BSLRel:基于二元序列标注的级联关系三元组抽取模型", 《中文信息学报》, vol. 35, no. 6, 30 June 2021 (2021-06-30), pages 74 - 84 *

Similar Documents

Publication Publication Date Title
CN110413785B (en) Text automatic classification method based on BERT and feature fusion
CN111522839B (en) Deep learning-based natural language query method
CN109582789A (en) Text multi-tag classification method based on semantic primitive information
CN112487812B (en) Nested entity identification method and system based on boundary identification
CN111858932A (en) Multiple-feature Chinese and English emotion classification method and system based on Transformer
CN116702091B (en) Multi-mode ironic intention recognition method, device and equipment based on multi-view CLIP
CN113743119B (en) Chinese named entity recognition module, method and device and electronic equipment
CN111984791B (en) Attention mechanism-based long text classification method
CN115858788A (en) Visual angle level text emotion classification system based on double-graph convolutional neural network
CN116956929B (en) Multi-feature fusion named entity recognition method and device for bridge management text data
CN110046356A (en) Label is embedded in the application study in the classification of microblogging text mood multi-tag
CN113051887A (en) Method, system and device for extracting announcement information elements
CN112784580A (en) Financial data analysis method and device based on event extraction
CN115098673A (en) Business document information extraction method based on variant attention and hierarchical structure
CN114299326A (en) Small sample classification method based on conversion network and self-supervision
CN114445832A (en) Character image recognition method and device based on global semantics and computer equipment
CN113869054A (en) Deep learning-based electric power field project feature identification method
CN113157918A (en) Commodity name short text classification method and system based on attention mechanism
CN116822513A (en) Named entity identification method integrating entity types and keyword features
CN116204643A (en) Cascade label classification method based on multi-task learning knowledge enhancement
CN113822599A (en) Power industry policy management method based on classification tree fusion technology
CN114996442A (en) Text abstract generation system combining abstract degree judgment and abstract optimization
CN114896404A (en) Document classification method and device
CN114611489A (en) Text logic condition extraction AI model construction method, extraction method and system
CN114510569A (en) Chemical emergency news classification method based on Chinesebert model and attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination