CN113901815A - Emergency working condition event detection method based on dam operation log - Google Patents

Emergency working condition event detection method based on dam operation log Download PDF

Info

Publication number
CN113901815A
CN113901815A CN202111202004.4A CN202111202004A CN113901815A CN 113901815 A CN113901815 A CN 113901815A CN 202111202004 A CN202111202004 A CN 202111202004A CN 113901815 A CN113901815 A CN 113901815A
Authority
CN
China
Prior art keywords
vector
dam
sentence
document
participles
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111202004.4A
Other languages
Chinese (zh)
Other versions
CN113901815B (en
Inventor
孙卫
周华
迟福东
毛莺池
李然
陈豪
王龙宝
程永
卢俊
钟鸣
夏旭东
李玲
赵欢
罗松
马建平
袁溯
吴胜亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hohai University HHU
Huaneng Group Technology Innovation Center Co Ltd
Huaneng Lancang River Hydropower Co Ltd
Original Assignee
Hohai University HHU
Huaneng Group Technology Innovation Center Co Ltd
Huaneng Lancang River Hydropower Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hohai University HHU, Huaneng Group Technology Innovation Center Co Ltd, Huaneng Lancang River Hydropower Co Ltd filed Critical Hohai University HHU
Priority to CN202111202004.4A priority Critical patent/CN113901815B/en
Publication of CN113901815A publication Critical patent/CN113901815A/en
Application granted granted Critical
Publication of CN113901815B publication Critical patent/CN113901815B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/20Administration of product repair or maintenance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/08Construction
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A10/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
    • Y02A10/40Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Computing Systems (AREA)
  • Strategic Management (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Business, Economics & Management (AREA)
  • Marketing (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Primary Health Care (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses an emergency working condition event detection method facing dam operation logs, which comprises the steps of constructing a dam emergency working condition event type set; all the participles in the dam operation log are coded and converted into embedded vectors corresponding to the participles; embedding vectors, named entity types and part-of-speech tagging vectors corresponding to the participles are fused, and semantic information of the participles is strengthened; the context information is fused by sentence-document dual attention, the important words which can trigger events in each sentence are improved by sentence-level attention, the important sentences which can trigger events in each log document are improved by document-level attention, the local and global semantic information of word segmentation is strengthened, and the problems of word ambiguity and mismatching of words and triggers in the traditional Chinese event detection are solved; in order to avoid the problem that two classification positive and negative samples are unbalanced due to the fact that each sentence in a common dam log document contains at most 2 events, a training model is adopted for event detection, and classification of all documents is achieved based on the events contained in each document.

Description

Emergency working condition event detection method based on dam operation log
Technical Field
The invention relates to an emergency working condition event detection method based on dam operation logs, which is used for carrying out event detection on the dam operation logs in the hydraulic field, specifically carrying out event detection on various special working condition events experienced by the dam under long-period operation and corresponding events thereof, and belongs to the technical field of natural language processing.
Background
The task of event detection is to identify event trigger words from large-scale unstructured natural language text and to correctly classify event types, trigger words referring to core words or phrases that best clearly express the occurrence of events. The event detection has important significance on event semantic modeling, and is convenient for subsequent structured management and storage of events.
In the field of hydraulic engineering, dam facilities provide multiple functions of flood control, slush control, water storage, water supply, power generation and the like, and are neutral flow rubbings for the development of water conservancy projects in China. During the operation of the dam for a long period of decades, the dam can encounter a plurality of natural risk events, such as flood, earthquake, rainstorm and the like, and the structural safety of the dam and the life and property safety of people at the downstream of the dam can be endangered. Therefore, dam managers can arrange comprehensive special inspection to maintain the dam structure after a special event occurs. In addition, daily inspection and overhaul of the dam are also important measures for guaranteeing the safety of the dam body of the dam. After various countermeasures, the inspection personnel can record the reason of the inspection event and the inspection result in a text mode to form a dam operation log file.
By processing the dam operation log to a certain extent, the dam self-construction can be analyzed so as to ensure the safety condition and form a dam event knowledge base, and the intelligent management level of the dam is improved. The dam operation log oriented emergency condition event detection method can skip an event trigger to detect all scheduled events in the dam operation log and classify the attributive event type of each document, and provides a foundation for the follow-up tasks of extracting events, constructing an event map and constructing an event knowledge base.
There are a large number of ambiguities in chinese text, and events are generally composed of event triggers and event arguments. Most event triggers are verbs, and the problems of word ambiguity and mismatching between the triggers and words generally exist, so that the event detection method taking the trigger identification as the core is easy to classify errors.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems in the prior art, various natural events and corresponding measure events thereof in the dam operation process and lack of standardized records aiming at the events, the invention provides an emergency working condition event detection method based on a dam operation log, which avoids the process of identifying a trigger, solves the problems by simulating the trigger in a sentence, detects dam special working condition events from the dam operation log, classifies the attributive event type of each document, and provides a basis for extracting the subsequent events.
The technical scheme is as follows: an emergency condition event detection method based on a dam operation log comprises the following steps:
(1) preprocessing a log file: firstly, sequencing and splitting dam operation logs according to recording dates, labeling each document, sequencing, labeling and segmenting sentences in each document, performing entity type labeling and part-of-speech labeling on each word, and then constructing a dam emergency condition event type set; the sorting refers to sorting logs of different dates; splitting refers to splitting logs on the same day according to document contents;
(2) code vector embedding: encoding all the participles in the dam operation log by using an ALBERT preprocessing model, and converting the participles into embedded vectors corresponding to the participles;
(3) BilSTM feature fusion: embedding vectors, named entity types and part-of-speech tagging vectors corresponding to the BiLSTM fused participles are used for strengthening semantic information of the participles;
(4) double attention mechanism semantic reinforcement: by using sentence-document dual attention fusion context information, the important words which can trigger events in each sentence are improved by sentence level attention, the important sentences which can trigger events in each log document are improved by document level attention, and the problems of word ambiguity and mismatching of words and triggers in traditional Chinese event detection are solved;
(5) training the model by using a Focal loss function and realizing classification: in order to avoid the problem that two classification positive and negative samples are unbalanced because each sentence in a common dam log document contains at most 2 events, a local loss function training model is adopted to realize classification of all document attribution events.
The dam emergency working condition event type set comprises typical events such as earthquake, rainstorm, flood discharge, flood front safety inspection, comprehensive special inspection, daily overhaul, daily inspection and the like.
The named entity types comprise name, department, position, time, date, measured value, percentage, defect type and the like; the part-of-speech tagging vector comprises a noun, a verb, an adjective, a quantifier, a pronoun and the like.
Further, the step (1) comprises the following steps:
(1.1) dividing the dam operation log file into a plurality of documents according to the log recording date, carrying out sequencing and labeling on each document, carrying out sequencing and labeling on sentences in each document, and carrying out word segmentation by using jieba with words as units;
(1.2) carrying out entity type labeling and part of speech labeling on the word segmentation result, wherein the entity type labeling converts the entity type labeling into a low-dimensional vector by searching an embedded table initialized randomly, the part of speech labeling adopts Stanford CoreNLP to label the part of speech of each word, and then the part of speech labeling is converted into the low-dimensional vector by searching a corresponding embedded table;
and (1.3) predefining the emergency working condition event types of the dam, wherein the emergency working condition event types comprise typical events such as earthquake, rainstorm, flood discharge, flood front safety inspection, comprehensive special inspection, daily overhaul, daily inspection and the like.
Further, the step (2) comprises the following steps:
and (3) performing coding processing on all the participles in the (1.1) by using an ALBERT pre-training model, and converting the participles into vector representations which can be processed by a computer.
Further, the step (3) comprises the following steps:
(3.1) connecting the embedded vector, the entity type vector and the part-of-speech tagging vector corresponding to each word in series, wherein the embedded vector is the vector obtained in the step (2), the entity type vector is a mathematical vector corresponding to recognition results of all word segmentation naming entities, such as names, organizations, positions, time, dates, numerical values, percentages and the like, and the part-of-speech tagging vector is a mathematical vector corresponding to part-of-speech tagging results of all words, such as nouns, verbs, adjectives, numerical words, pronouns and the like;
(3.2) processing the series vectors in a single sentence by using a BilSTM model, wherein each vector is an input, capturing word context information by using a bidirectional LSTM unit, and respectively outputting two hidden states
Figure BDA0003305279490000031
And
Figure BDA0003305279490000032
synthesizing the two vectors into an output vector
Figure BDA0003305279490000033
Further, the step (4) comprises the following steps:
(4.1) in the training set, converting the predefined events of the emergency working conditions contained in each sentence into an embedded vector t by searching a randomly initialized embedded table1Converting each document into an embedded vector d using Dov2 Vec;
(4.2) for all participles in each sentence, calculating the weight of each participle in the sentence by using a local attention mechanism, improving the word attention weight of the trigger target event type and simulating a trigger, wherein the calculation formula is as follows:
Figure BDA0003305279490000034
wherein h iskIs the kth part of the output vector h,
Figure BDA0003305279490000035
is the local attention vector alphasThe (c) th part of (a),
Figure BDA0003305279490000036
is the transpose of the event type embedding vector; the trigger refers to an event trigger, namely a word triggering a certain event, which is generally a verb;
(4.3) for all the participles in each sentence, calculating the weight of the sentence in which the participle is located in the document of the sentence by using a global attention mechanism, obtaining the unique meaning of the trigger in the scene, assisting in judging the event type of the sentence, and solving the ambiguity problem of the trigger caused by the context information, wherein the calculation formula is as follows:
Figure BDA0003305279490000037
wherein h iskIs the kth part of the output vector h,
Figure BDA0003305279490000038
is the global attention vector alphadThe (c) th part of (a),
Figure BDA0003305279490000039
is event type embedding vector transposition, dTIs a document level embedded vector transpose;
(4.4) weighting and fusing the local attention and the global attention to improve the event detection precision, and calculating the weighting vector and the weighting fusion formula of the local attention and the global attention to the event as follows:
vs=αs·t1
vd=αd·t2
o=σ(λ·vs+(1-λ)·vd)
wherein the final output value o is defined by vsAnd vdTwo parts are formed. v. ofsFrom alphasAnd event type embedding vector t1Generating dot product for capturing local features and simulating hidden event triggers; v. ofdFrom alphadAnd t2Dot product generation to capture global features and context information. σ is Sigmoid function, λ ∈ [0,1 ]]Is at vsAnd vdA hyper-parameter to make a trade-off between.
Further, the step (5) comprises the following steps:
the data set is processed in sentence units, with < sentence, event type > pairs constituting training data representing whether a given sentence conveys a t-type event, the event type label is 1 or 0, e.g. < dam bank, hub side slope and road inspection: no abnormity, wherein the label of the daily inspection is more than 1, and the inspection conditions of the bank near the dam, the slope in the hub area and the road are less than: the label of the earthquake > training pair is 0, and because the number of events possibly expressed by a single sentence is less than the predefined number of events, aiming at the problem that the number of negative samples is far larger than the number of positive samples caused by two-classification identification, a model obtained by training a Focal loss function is introduced to strengthen the influence of the positive samples and the hard samples on the model, and the calculation formula is as follows:
Figure BDA0003305279490000041
where x is composed of the sentence and the target event type, y ∈ {0,1}, o (x)(i)) Is the predicted value of the model, | theta | | Y luminance2Is the sum of squares of the individual elements in the modelδ > 0 is the weight of the L2 normalization term, β is a parameter of the positive-negative weight ratio of the balance samples, γ is a parameter of the hard-to-classify and easy-to-classify weight ratio of the balance samples, and β is set to 0.25 and γ is set to 2 in this experiment.
And finally, carrying out event detection on the dam operation log files by using the trained model, and classifying based on the event types contained in each document.
An emergency condition event detection system based on dam operation logs is used for carrying out event detection on the dam operation logs in the water conservancy field and comprises the following components:
a log file preprocessing module: firstly, sequencing and splitting dam operation logs according to recording dates, labeling each document, sequencing, labeling and segmenting sentences in each document, performing entity type labeling and part-of-speech labeling on each word, and then constructing a dam emergency condition event type set;
a code vector embedding module: encoding all the participles in the dam operation log by using an ALBERT preprocessing model, and converting the participles into embedded vectors corresponding to the participles;
BilSTM feature fusion module: embedding vectors, named entity types and part-of-speech tagging vectors corresponding to the BiLSTM fused participles are used for strengthening semantic information of the participles;
the double attention mechanism semantic strengthening module: fusing contextual information using sentence-document dual attention;
one model is: and training the model by using a Focal loss function, and classifying all document attribution events by using the trained model.
A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the method for detecting emergency condition events based on the dam operation log as described above when executing the computer program.
A computer readable storage medium having stored thereon a computer program for executing the method for emergency condition event detection based on a dam operation log as described above.
Has the advantages that: compared with the prior art, the dam operation log-based emergency condition event detection method provided by the invention has the advantages that keyword and sentence-level semantic information are captured through local attention, a hidden event trigger is simulated, the event detection under the condition of no trigger is realized, rich document-level context information is introduced through global attention, the meaning of words under the real context is assisted to be judged, the trigger recognition link is skipped, and the event type is directly judged. The problems of mismatching and word ambiguity of Chinese words and triggers are avoided, and the event detection precision is improved.
Drawings
FIG. 1 is a flow chart of model training according to an embodiment of the present invention;
FIG. 2 is a diagram of a model training framework according to an embodiment of the present invention.
Detailed Description
The present invention is further illustrated by the following examples, which are intended to be purely exemplary and are not intended to limit the scope of the invention, as various equivalent modifications of the invention will occur to those skilled in the art upon reading the present disclosure and fall within the scope of the appended claims.
As shown in fig. 1, the method for detecting an emergency condition event based on a dam operation log mainly comprises the following steps:
step (1) preprocessing a dam operation log file
(1.1) dividing the dam operation log file into a plurality of documents according to the log recording date, using the documents as a training set, carrying out sequencing and labeling on each document, carrying out sequencing and labeling on sentences in each document, and using jieba to carry out word segmentation by taking words as units. As in fig. 2, the sentence "near dam bank, hub side slope and highway inspection condition: no anomaly "is first split into" near dam "," bank "," hub area "," side slope "," and "," road "," inspection "," condition ",": "," none ", and" abnormal ".
(1.2) carrying out entity type labeling and part of speech labeling on the segmentation result, converting the entity type labeling into a low-dimensional vector by searching a randomly initialized embedding table, labeling the part of speech of each word by using Stanford CoreNLP (morphological segmentation and classification) and then converting the part of speech into the low-dimensional vector by searching the embedding table.
And (1.3) predefining dam emergency working condition event types for dam operation logs, wherein typical events comprise earthquake, rainstorm, flood discharge, flood pre-flood safety inspection, comprehensive special inspection, daily overhaul, daily inspection and the like.
Step (2) encoding the participles into word vectors
And (3) performing coding processing on all the participles in the (1.3) by using an ALBERT pre-training model, and converting the participles into vector representations which can be processed by a computer.
And (3) splicing the word vector, the named entity type and the part-of-speech tagging vector, and then extracting semantic information.
(3.1) connecting the embedded vector, the entity type vector and the part-of-speech tagging vector corresponding to each participle in series, wherein the embedded vector is the vector obtained in the step (2), the entity type vector is the mathematical vector corresponding to the recognition results of all participle named entities, such as names, organizations, positions, time, dates, measured values, percentages, defect types and the like, and the part-of-speech tagging vector is the mathematical vector corresponding to the part-of-speech tagging results of all participles, such as nouns, verbs, adjectives, quantitative words, pronouns and the like.
(3.2) processing the series vectors in a single sentence by using a BilSTM model, wherein each vector is an input, capturing word context information by using a bidirectional LSTM unit, and respectively outputting two hidden states
Figure BDA00033052794900000610
And
Figure BDA00033052794900000611
synthesizing the two vectors into an output vector
Figure BDA0003305279490000069
Step (4) capturing sentence level context and document level context using a dual attention mechanism, enhancing word vector representation, simulating triggers
(4.1) in the training set, converting the events contained in each sentence into an embedded vector t by searching an embedded table initialized randomly1Each document is converted to an embedded vector d using Dov2 Vec.
(4.2) for all participles in each sentence, calculating the weight of each participle in the sentence by using a local attention mechanism, improving the word attention weight of the trigger target event type and simulating a trigger, wherein the calculation formula is as follows:
Figure BDA0003305279490000061
wherein h iskIs the kth part of the output vector h,
Figure BDA0003305279490000062
is the local attention vector alphasThe (c) th part of (a),
Figure BDA0003305279490000063
is the transpose of the event type embedding vector. As in fig. 2
Figure BDA0003305279490000068
For assisting the local attention mechanism, a trigger is simulated for each participle.
(4.3) for all participles in each sentence, calculating the weight of the sentence in which the participle is located in the document by using a global attention mechanism, obtaining the unique meaning of the trigger in the scene, assisting in judging the event type of the sentence, and solving the ambiguity problem of the trigger caused by the context information, wherein the calculation formula is as follows:
Figure BDA0003305279490000064
wherein h iskIs the kth part of the output vector h,
Figure BDA0003305279490000065
is the global attention vector alphadThe (c) th part of (a),
Figure BDA0003305279490000066
is event type embedding vector transposition, dTIs a document level embedded vector transpose. As in fig. 2
Figure BDA0003305279490000067
The method is used for assisting global attention and avoiding ambiguity problems caused by local attention.
(4.4) weighting and fusing the local attention and the global attention to improve the event detection precision, wherein the formula is as follows:
vs=αs·t1
vd=αd·t2
o=σ(λ·vs+(1-λ)·vd)
wherein the final output value o is defined by vsAnd vdTwo parts are formed. v. ofsFrom alphasAnd t1Generating dot product for capturing local features and simulating hidden event triggers; v. ofdFrom alphadAnd t2Dot product generation to capture global features and context information. σ is Sigmoid function, λ ∈ [0,1 ]]Is at vsAnd vdA hyper-parameter to make a trade-off between.
Step (5) adopting a Focal loss function to avoid the unbalance problem of positive and negative samples and realize the classification of all documents
The data set is processed in sentence units, with < sentence, event type > pairs constituting training data representing whether a given sentence conveys a t-type event, with a tag of 1 or 0, e.g. < dam bank, hub side slope and road inspection: no abnormity, wherein the label of the daily inspection is more than 1, and the inspection conditions of the bank near the dam, the slope in the hub area and the road are less than: the label of the earthquake > training pair is 0, and because the number of events possibly expressed by a single sentence is less than the predefined number of events, a Focal loss function is introduced to solve the problem that the number of negative samples is far larger than the number of positive samples due to two-classification recognition, and the influence of the positive samples and the hard samples on the model is enhanced, and the calculation formula is as follows:
Figure BDA0003305279490000071
where x is composed of the sentence and the target event type, y ∈ {0,1}, o (x)(i)) Is the predicted value of the model, | theta | | Y luminance2The square sum of each element in the model is represented, δ > 0 is the weight of the L2 normalization term, β is the parameter of the positive-negative weight proportion of the balance sample, γ is the parameter of the hard-to-classify and easy-to-classify weight proportion of the balance sample, and the experiment sets β to 0.25 and γ to 2;
and finally, carrying out event detection on the dam operation log files by using the trained model, and classifying the documents based on the event types contained in each document.
An emergency condition event detection system based on dam operation logs is used for carrying out event detection on the dam operation logs in the water conservancy field and comprises the following components:
a log file preprocessing module: firstly, sequencing and splitting dam operation logs according to recording dates, labeling each document, sequencing, labeling and segmenting sentences in each document, performing entity type labeling and part-of-speech labeling on each word, and then constructing a dam emergency condition event type set;
a code vector embedding module: encoding all the participles in the dam operation log by using an ALBERT preprocessing model, and converting the participles into embedded vectors corresponding to the participles;
BilSTM feature fusion module: embedding vectors, named entity types and part-of-speech tagging vectors corresponding to the BiLSTM fused participles are used for strengthening semantic information of the participles;
the double attention mechanism semantic strengthening module: fusing contextual information using sentence-document dual attention;
one model is: and training the model by using a Focal loss function, and classifying all document attribution events by using the trained model.
It should be apparent to those skilled in the art that the above-mentioned steps of the dam operation log based emergency operation condition event detection method or dam operation log based emergency operation condition event detection system according to the embodiments of the present invention may be implemented by a general-purpose computing device, they may be centralized on a single computing device, or distributed on a network formed by a plurality of computing devices, or alternatively, they may be implemented by program codes executable by the computing devices, so that they may be stored in a storage device and executed by the computing devices, and in some cases, the steps shown or described may be executed in a different order from that of them, or they may be respectively fabricated as integrated circuit modules, or a plurality of modules or steps therein may be fabricated as a single integrated circuit module. Thus, embodiments of the invention are not limited to any specific combination of hardware and software.

Claims (9)

1. An emergency working condition event detection method based on dam operation logs is used for carrying out event detection on the dam operation logs in the water conservancy field, and is characterized by comprising the following steps:
(1) preprocessing a log file: firstly, sequencing and splitting dam operation logs according to recording dates, labeling each document, sequencing, labeling and segmenting sentences in each document, performing entity type labeling and part-of-speech labeling on each word, and then constructing a dam emergency condition event type set;
(2) code vector embedding: encoding all the participles in the dam operation log by using an ALBERT preprocessing model, and converting the participles into embedded vectors corresponding to the participles;
(3) BilSTM feature fusion: embedding vectors, named entity types and part-of-speech tagging vectors corresponding to the BiLSTM fused participles are used for strengthening semantic information of the participles;
(4) double attention mechanism semantic reinforcement: fusing contextual information using sentence-document dual attention;
(5) training the model by using a Focal loss function and realizing classification: and (3) adopting a Focal loss function training model to realize classification of all document attribution events.
2. The dam operation log-based emergency condition event detection method according to claim 1, wherein the step (1) comprises the steps of:
(1.1) dividing the dam operation log file into a plurality of documents according to the log recording date, carrying out sequencing and labeling on each document, carrying out sequencing and labeling on sentences in each document, and carrying out word segmentation by using jieba with words as units;
(1.2) carrying out entity type labeling and part of speech labeling on the segmentation result, wherein the entity type labeling is converted into a low-dimensional vector by searching a randomly initialized embedding table, the part of speech labeling adopts Stanford CoreNLP to label the part of speech of each word, and then the part of speech is converted into the low-dimensional vector by searching the embedding table;
and (1.3) predefining the emergency working condition event types of the dam, wherein the emergency working condition event types comprise typical events such as earthquake, rainstorm, flood discharge, flood front safety inspection, comprehensive special inspection, daily overhaul, daily inspection and the like.
3. The dam operation log-based emergency situation event detection method according to claim 1, wherein in the step (2), all the participles are encoded and converted into vector representations capable of being processed by a computer by using an ALBERT pre-training model.
4. The dam operation log-based emergency condition event detection method according to claim 1, wherein the step (3) comprises the following steps:
(3.1) connecting the embedded vector, the entity type vector and the part-of-speech tagging vector corresponding to each word in series, wherein the embedded vector is the vector obtained in the step (2), the entity type vector is the mathematical vector of the recognition result of all the word-segmentation named entities, and the part-of-speech tagging vector is the mathematical vector of the part-of-speech tagging result of all the words;
(3.2) processing the concatenated vectors in a single sentence by using a BilSTM model, wherein each vector is an input, capturing word context information by using a bidirectional LSTM unit, and respectively outputting two vectorsHidden state
Figure FDA0003305279480000011
And
Figure FDA0003305279480000012
synthesizing the two vectors into an output vector
Figure FDA0003305279480000021
5. The dam operation log-based emergency condition event detection method according to claim 1, wherein the step (4) comprises the following steps:
(4.1) in the training set, converting the predefined events of the emergency working conditions contained in each sentence into an embedded vector t by searching a randomly initialized embedded table1Converting each document into an embedded vector d using Dov2 Vec;
(4.2) for all participles in each sentence, calculating the weight of each participle in the sentence by using a local attention mechanism, improving the word attention weight of the trigger target event type and simulating a trigger, wherein the calculation formula is as follows:
Figure FDA0003305279480000022
wherein h iskIs the kth part of the output vector h,
Figure FDA0003305279480000023
is the local attention vector alphasThe (c) th part of (a),
Figure FDA0003305279480000024
is the transpose of the event type embedding vector;
(4.3) for all the participles in each sentence, calculating the weight of the sentence in which the participle is located in the document of the sentence by using a global attention mechanism, obtaining the unique meaning of the trigger in the scene, assisting in judging the event type of the sentence, and solving the ambiguity problem of the trigger caused by the context information, wherein the calculation formula is as follows:
Figure FDA0003305279480000025
wherein h iskIs the kth part of the output vector h,
Figure FDA0003305279480000026
is the global attention vector alphadThe (c) th part of (a),
Figure FDA0003305279480000027
is event type embedding vector transposition, dTIs a document level embedded vector transpose;
(4.4) weighting and fusing the local attention and the global attention to improve the event detection precision, wherein the formula is as follows:
vs=αs·t1
vd=αd·t2
o=σ(λ·vs+(1-λ)·vd)
wherein the final output value o is defined by vsAnd vdTwo parts are formed. v. ofsFrom alphasAnd t1Generating dot product for capturing local features and simulating hidden event triggers; v. ofdFrom alphadAnd t2Dot product generation for capturing global features and context information; σ is Sigmoid function, λ ∈ [0,1 ]]Is at vsAnd vdA hyper-parameter to make a trade-off between.
6. The dam operating log-based emergency condition event detection method according to claim 1, wherein in the step (5), the data set is processed in sentence units, the < sentence, event type > pairs constitute training data, which represents whether a given sentence conveys t-type events, the label of the t-type events is 1 or 0, a Focal loss function is introduced, and the influence of positive samples and hard samples on the model is enhanced, and the calculation formula is as follows:
Figure FDA0003305279480000031
where x is composed of the sentence and the target event type, y ∈ {0,1}, o (x)(i)) Is the predicted value of the model, | theta | | Y luminance2Is the sum of squares of each element in the model, delta > 0 is the weight of L2 normalization term, beta is the parameter of the proportion of positive and negative weights of the balance sample, and gamma is the parameter of the proportion of hard-to-classify and easy-to-classify weights of the balance sample;
and finally, carrying out event detection on the dam operation log files by using the trained model, and classifying based on the event types contained in each document.
7. The utility model provides an emergent operating mode incident detecting system based on dam service log carries out the incident to dam service log in the water conservancy field and detects which characterized in that includes:
a log file preprocessing module: firstly, sequencing and splitting dam operation logs according to recording dates, labeling each document, sequencing, labeling and segmenting sentences in each document, performing entity type labeling and part-of-speech labeling on each word, and then constructing a dam emergency condition event type set;
a code vector embedding module: encoding all the participles in the dam operation log by using an ALBERT preprocessing model, and converting the participles into embedded vectors corresponding to the participles;
BilSTM feature fusion module: embedding vectors, named entity types and part-of-speech tagging vectors corresponding to the BiLSTM fused participles are used for strengthening semantic information of the participles;
the double attention mechanism semantic strengthening module: fusing contextual information using sentence-document dual attention;
one model is: and training a model by using a Focal loss function, and classifying all document attribution events by using the trained model.
8. A computer device, characterized by: the computer device comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the computer program to realize the dam operation log-based emergency condition event detection method according to any one of claims 1 to 6.
9. A computer-readable storage medium characterized by: the computer readable storage medium stores a computer program for executing the dam operation log based emergency situation event detection method according to any one of claims 1 to 6.
CN202111202004.4A 2021-10-15 2021-10-15 Emergency working condition event detection method based on dam operation log Active CN113901815B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111202004.4A CN113901815B (en) 2021-10-15 2021-10-15 Emergency working condition event detection method based on dam operation log

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111202004.4A CN113901815B (en) 2021-10-15 2021-10-15 Emergency working condition event detection method based on dam operation log

Publications (2)

Publication Number Publication Date
CN113901815A true CN113901815A (en) 2022-01-07
CN113901815B CN113901815B (en) 2023-05-05

Family

ID=79192213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111202004.4A Active CN113901815B (en) 2021-10-15 2021-10-15 Emergency working condition event detection method based on dam operation log

Country Status (1)

Country Link
CN (1) CN113901815B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115082898A (en) * 2022-07-04 2022-09-20 小米汽车科技有限公司 Obstacle detection method, obstacle detection device, vehicle, and storage medium
CN116738366A (en) * 2023-06-16 2023-09-12 河海大学 Method and system for identifying causal relationship of dam emergency event based on feature fusion

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014094332A1 (en) * 2012-12-21 2014-06-26 东莞中国科学院云计算产业技术创新与育成中心 Method for creating knowledge base engine for emergency management of sudden event and method for querying in knowledge base engine
CN110135457A (en) * 2019-04-11 2019-08-16 中国科学院计算技术研究所 Event trigger word abstracting method and system based on self-encoding encoder fusion document information
CN111881258A (en) * 2020-07-28 2020-11-03 广东工业大学 Self-learning event extraction method and application thereof
CN112612871A (en) * 2020-12-17 2021-04-06 浙江大学 Multi-event detection method based on sequence generation model
CN112765952A (en) * 2020-12-28 2021-05-07 大连理工大学 Conditional probability combined event extraction method under graph convolution attention mechanism
CN113312500A (en) * 2021-06-24 2021-08-27 河海大学 Method for constructing event map for safe operation of dam

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014094332A1 (en) * 2012-12-21 2014-06-26 东莞中国科学院云计算产业技术创新与育成中心 Method for creating knowledge base engine for emergency management of sudden event and method for querying in knowledge base engine
CN110135457A (en) * 2019-04-11 2019-08-16 中国科学院计算技术研究所 Event trigger word abstracting method and system based on self-encoding encoder fusion document information
CN111881258A (en) * 2020-07-28 2020-11-03 广东工业大学 Self-learning event extraction method and application thereof
CN112612871A (en) * 2020-12-17 2021-04-06 浙江大学 Multi-event detection method based on sequence generation model
CN112765952A (en) * 2020-12-28 2021-05-07 大连理工大学 Conditional probability combined event extraction method under graph convolution attention mechanism
CN113312500A (en) * 2021-06-24 2021-08-27 河海大学 Method for constructing event map for safe operation of dam

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
GUOYAN XU ET AL: "A Dam Deformation Prediction Model Based on ARIMA-LSTM", 《2020 IEEE SIXTH INTERNATIONAL CONFERENCE ON BIG DATA COMPUTING SERVICE AND APPLICATIONS 》 *
易士翔: "基于深度学习和嵌入特征空间的网络新闻事件抽取研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115082898A (en) * 2022-07-04 2022-09-20 小米汽车科技有限公司 Obstacle detection method, obstacle detection device, vehicle, and storage medium
CN116738366A (en) * 2023-06-16 2023-09-12 河海大学 Method and system for identifying causal relationship of dam emergency event based on feature fusion

Also Published As

Publication number Publication date
CN113901815B (en) 2023-05-05

Similar Documents

Publication Publication Date Title
CN109800310B (en) Electric power operation and maintenance text analysis method based on structured expression
CN112487203A (en) Relation extraction system integrated with dynamic word vectors
Jiang et al. De-identification of medical records using conditional random fields and long short-term memory networks
Krasnowska-Kieraś et al. Empirical linguistic study of sentence embeddings
CN113901815A (en) Emergency working condition event detection method based on dam operation log
CN112395876B (en) Knowledge distillation and multitask learning-based chapter relationship identification method and device
CN112434161B (en) Aspect-level emotion analysis method adopting bidirectional long-short term memory network
CN111475650A (en) Russian semantic role labeling method, system, device and storage medium
CN113312914A (en) Safety event entity identification method based on pre-training model
CN115455202A (en) Emergency event affair map construction method
Mishra et al. Memotion 3: Dataset on sentiment and emotion analysis of codemixed hindi-english memes
Lee et al. Detecting suicidality with a contextual graph neural network
Yela-Bello et al. MultiHumES: Multilingual humanitarian dataset for extractive summarization
Gao et al. ABCD: A graph framework to convert complex sentences to a covering set of simple sentences
CN115730071A (en) Electric power public opinion event extraction method and device, electronic equipment and storage medium
CN116186241A (en) Event element extraction method and device based on semantic analysis and prompt learning, electronic equipment and storage medium
Zhang et al. Sentiment identification by incorporating syntax, semantics and context information
Hourali et al. Coreference resolution using neural mcdm and fuzzy weighting technique
CN111815426B (en) Data processing method and terminal related to financial investment and research
Ullah et al. Unveiling the Power of Deep Learning: A Comparative Study of LSTM, BERT, and GRU for Disaster Tweet Classification
CN113191160A (en) Emotion analysis method for knowledge perception
Lv et al. Automatic key-phrase extraction to support the understanding of infrastructure disaster resilience
Sirirattanajakarin et al. BoydCut: Bidirectional LSTM-CNN Model for Thai Sentence Segmenter
Zhang et al. A deep-learning method for evaluating semantically-rich building code annotations
Roze et al. Which aspects of discourse relations are hard to learn? Primitive decomposition for discourse relation classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant