CN114385795A - Accident information extraction method and device and electronic equipment - Google Patents

Accident information extraction method and device and electronic equipment Download PDF

Info

Publication number
CN114385795A
CN114385795A CN202110896545.5A CN202110896545A CN114385795A CN 114385795 A CN114385795 A CN 114385795A CN 202110896545 A CN202110896545 A CN 202110896545A CN 114385795 A CN114385795 A CN 114385795A
Authority
CN
China
Prior art keywords
character
text
accident
information
result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110896545.5A
Other languages
Chinese (zh)
Inventor
杨继星
房玉东
邢晓毅
柳树林
张卫伟
边路
狄瑞晟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Communication And Information Center Of Emergency Management Department
Original Assignee
Communication And Information Center Of Emergency Management Department
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Communication And Information Center Of Emergency Management Department filed Critical Communication And Information Center Of Emergency Management Department
Priority to CN202110896545.5A priority Critical patent/CN114385795A/en
Publication of CN114385795A publication Critical patent/CN114385795A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3343Query execution using phonetics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides an accident information extraction method, an accident information extraction device and electronic equipment, and is applied to the technical field of emergency management. The method comprises the following steps: acquiring an accident receiving text; wherein, the accident reporting file is as follows: generating a text based on an accident reporting dialog for the accident to be processed; utilizing a pre-trained information classification model to classify the accident receiving text to obtain effective text sections each containing effective information in the accident receiving text and the accident information category to which each effective text section belongs; and aiming at each effective text segment in the accident reporting text, based on the corresponding relation between the preset accident information category and the text processing operation, processing the text segment by adopting the information extraction operation corresponding to the accident information category to which the text segment belongs to obtain the accident information of the to-be-processed accident recorded by the text segment. Through the scheme, the efficiency of determining the accident information can be improved.

Description

Accident information extraction method and device and electronic equipment
Technical Field
The invention relates to the technical field of emergency management, in particular to an accident information extraction method and device and electronic equipment.
Background
In recent years, with the increasing of the social and economic level, social activities are gradually increased, so that the number of various accidents is increased, and adverse effects are brought to the stable and harmonious society. Wherein, divide according to the accident type, the accident of outburst mainly includes the safety in production accident, the transportation accident of industrial and mining trade, building accident on fire etc. and generally has a plurality of accident key elements to constitute, for example: time of occurrence, place of occurrence, number of injured persons, number of dead persons, economic loss, etc.
In order to quickly respond to an emergency, after receiving an emergency call, an operator of the transaction reporting platform needs to analyze and judge the situation of the emergency according to conversation with a reporting person, and determine accident information of the emergency so as to match and start a corresponding emergency plan. However, the daily reporting of accidents in large and medium-sized cities can reach hundreds, and the accidents are determined only by manual processing, so that the efficiency of the accident information determination is low.
Disclosure of Invention
The embodiment of the invention aims to provide an accident information extraction method, an accident information extraction device and electronic equipment so as to improve the accident information determination efficiency. The specific technical scheme is as follows:
in a first aspect, an embodiment of the present invention provides an accident information extraction method, where the method includes:
acquiring an accident receiving text; wherein, the accident reporting file is as follows: generating a text based on an accident reporting dialog for the accident to be processed;
utilizing a pre-trained information classification model to classify the accident receiving text to obtain effective text sections each containing effective information in the accident receiving text and an accident information category to which each effective text section belongs; wherein, the information classification model is as follows: training based on the sample text and the corresponding labeling result, wherein the labeling result is used for indicating: each effective text segment in the sample text and the accident information category to which each effective text segment belongs;
and aiming at each effective text segment in the accident reporting text, based on the corresponding relation between the preset accident information category and the text processing operation, processing the text segment by adopting the information extraction operation corresponding to the accident information category to which the text segment belongs to obtain the accident information of the accident to be processed recorded by the text segment.
Optionally, the information classification model is trained according to the following method:
acquiring a target sample text from the training sample set;
inputting the target sample text into a neural network model to be trained, so that the neural network model classifies accident information of the target sample text to obtain a prediction classification result; wherein the prediction classification result comprises: each effective text segment in the target sample text and the accident information category to which each effective text segment belongs;
adjusting network parameters of the neural network model based on the difference between the prediction classification result and the calibration result of the target sample text; and returning to the step of obtaining the target sample text from the training sample set.
Optionally, the labeling result corresponding to each sample text is: the accident information category of each character in the effective text segment of the sample text;
the neural network model obtains a prediction classification result in the following mode:
converting each character in the target sample text into a character vector corresponding to the character;
based on the character sequence of each character in the sample text, sequentially carrying out cyclic processing on the character vector of each character to obtain an initial probability vector of each character; wherein, the numerical value of each dimension in the probability vector of each character is characterized by: a probability that the character belongs to an accident information type corresponding to the dimension;
adjusting the initial probability vector of each character according to the constraint condition; obtaining a target probability vector; wherein the constraint condition is obtained by learning the neural network model through historical training data;
and determining the accident information category to which each character belongs as a prediction classification result based on the target probability vector of each character.
Optionally, the performing, in sequence, cyclic processing on the character vectors of the characters based on the character sequence of the characters in the sample text to obtain an initial probability vector of each character includes:
for each character in the characters, obtaining an initial probability vector of the character by adopting the following method, wherein the method comprises the following steps:
acquiring the preceding character features and the preceding character features of the characters according to the forward character sequence of each character in the sample text; wherein, the preamble features are extracted features based on all the characters in the preamble of the character; the pre-character features are: a first character feature extracted based on a character preceding the character;
calculating a first character feature of the character based on the preceding feature, the preceding character feature and the character vector of the character, and determining an initial probability vector of the character based on the first character feature of the character.
Optionally, after the calculating a first character feature of the character based on the previous feature, the previous character feature, and the character vector of the character, the method further includes:
updating the preceding character feature based on the preceding character feature and the character vector of the character.
Optionally, before determining the initial probability vector of the character based on the first character feature of the character, the method further includes:
acquiring postamble characteristics and postamble characteristics of the characters according to the reverse character sequence of the characters in the sample text; wherein, the postamble features are extracted features based on all characters of the character postamble; the rear character features are: a second character feature extracted based on a character subsequent to the character;
calculating a second character feature of the character based on the postamble feature, the postamble feature and the character vector of the character;
determining an initial probability vector of the character based on the first character feature of the character, including:
an initial probability vector for the character is determined based on a first character feature of the character and a second character feature of the character.
Optionally, the adjusting the network parameters of the neural network model based on the difference between the prediction classification result and the calibration result of the target sample text includes:
calculating a loss function value of the neural network model based on the prediction classification result and the calibration result of the target sample text as a difference between the prediction classification result and the calibration result of the target sample text;
adjusting network parameters of the neural network model based on the loss function values.
Optionally, the number of the prediction classification results is multiple;
the calculating a loss function value of the neural network model based on the prediction classification result and the calibration result of the target sample text comprises:
calculating a result score for each predicted classification result;
adding the result scores of all the test classification results to obtain a prediction score of the neural network model;
and calculating a loss function value of the neural network model based on the prediction score and the marking score corresponding to the calibration result of the target sample text.
Optionally, the calculating the result score of each predicted classification result includes:
the result score for each predicted classification result is calculated using the following formula:
Figure BDA0003198077280000045
Figure BDA0003198077280000041
wherein, PiScoring the outcome of the ith prediction;
Figure BDA0003198077280000042
representing the probability of the accident information category to which the jth character belongs in the ith prediction result;
Figure BDA0003198077280000043
and the jump probability between the accident information category to which the j-1 th character belongs and the accident information category to which the j-th character belongs in the ith prediction result is represented.
Optionally, the calculating a loss function value of the neural network model based on the prediction score and the annotation score corresponding to the calibration result of the target sample text includes:
calculating a loss function value of the neural network model according to the following formula:
Figure BDA0003198077280000044
wherein LossFunction is the loss function value, PRealPathMarking grade, P, corresponding to the calibration result of the target sample texttotalScoring the prediction.
Optionally, the labeling result corresponding to each sample text is: and marking the sample text by adopting a BIO marking mode to obtain a marking result.
Optionally, the following method is adopted, and the generating of the accident report receiving text based on the accident report receiving dialogue for the accident to be processed includes:
and carrying out voice recognition on the accident reporting conversation aiming at the accident to be processed to generate an accident reporting text.
Optionally, after the information extraction operation corresponding to the accident information category to which the text segment belongs is adopted to process the text segment based on the preset corresponding relationship between the accident information category and the text processing operation for each effective text segment in the accident reporting text, and the accident information of the accident to be processed recorded in the text segment is obtained, the method further includes:
based on the obtained accident information, an accident information report table is generated.
In a second aspect, an embodiment of the present invention provides an accident information extraction apparatus, where the apparatus includes:
the text acquisition module is used for acquiring an accident receiving text; wherein, the accident reporting file is as follows: generating a text based on an accident reporting dialog for the accident to be processed;
the information classification module is used for classifying the accident information of the accident receiving text by utilizing a pre-trained information classification model to obtain effective text sections containing effective information in the accident receiving text and the accident information category of each effective text section; wherein, the information classification model is as follows: training based on the sample text and the corresponding labeling result, wherein the labeling result is used for indicating: each effective text segment in the sample text and the accident information category to which each effective text segment belongs;
and the information extraction module is used for processing each effective text segment in the accident reporting text by adopting the information extraction operation corresponding to the accident information category to which the text segment belongs based on the corresponding relation between the preset accident information category and the text processing operation so as to obtain the accident information of the to-be-processed accident recorded by the text segment.
Optionally, the information classification model is trained according to the following modules:
the text acquisition module is used for acquiring a target sample text from the training sample set;
the text input module is used for inputting the target sample text into a neural network model to be trained so that the neural network model can classify the accident information of the target sample text to obtain a prediction classification result; wherein the prediction classification result comprises: each effective text segment in the target sample text and the accident information category to which each effective text segment belongs;
the parameter adjusting module is used for adjusting network parameters of the neural network model based on the difference between the prediction classification result and the calibration result of the target sample text; and returning to the step of obtaining the target sample text from the training sample set.
Optionally, the labeling result corresponding to each sample text is: the accident information category of each character in the effective text segment of the sample text;
the neural network model comprises:
the vector conversion module is used for converting each character in the target sample text into a character vector corresponding to the character;
the cyclic processing module is used for sequentially carrying out cyclic processing on the character vectors of the characters based on the character sequence of the characters in the sample text to obtain an initial probability vector of each character; wherein, the numerical value of each dimension in the probability vector of each character is characterized by: a probability that the character belongs to an accident information type corresponding to the dimension;
the probability adjusting module is used for adjusting the initial probability vector of each character according to the constraint condition; obtaining a target probability vector; wherein the constraint condition is obtained by learning the neural network model through historical training data;
and the class prediction module is used for determining the accident information class to which each character belongs based on the target probability vector of each character as a prediction classification result.
Optionally, the loop processing module is specifically configured to, for each character in the characters, obtain an initial probability vector of the character in the following manner, where the method includes: acquiring the preceding character features and the preceding character features of the characters according to the forward character sequence of each character in the sample text; wherein, the preamble features are extracted features based on all the characters in the preamble of the character; the pre-character features are: a first character feature extracted based on a character preceding the character; calculating a first character feature of the character based on the preceding feature, the preceding character feature and the character vector of the character, and determining an initial probability vector of the character based on the first character feature of the character.
Optionally, the loop processing module is further configured to, after the first character feature of the character is calculated based on the preceding feature, the preceding character feature and the character vector of the character, update the preceding feature based on the preceding character feature and the character vector of the character.
Optionally, the loop processing module is further configured to, before determining the initial probability vector of the character based on the first character feature of the character, obtain a postamble feature and a postamble feature of the character according to a reverse character sequence of each character in the sample text; wherein, the postamble features are extracted features based on all characters of the character postamble; the rear character features are: a second character feature extracted based on a character subsequent to the character; calculating a second character feature of the character based on the postamble feature, the postamble feature and the character vector of the character;
the loop processing module is specifically configured to determine an initial probability vector of the character based on a first character feature of the character and a second character feature of the character.
Optionally, the parameter adjusting module is specifically configured to calculate a loss function value of the neural network model based on the prediction classification result and the calibration result of the target sample text, as a difference between the prediction classification result and the calibration result of the target sample text; adjusting network parameters of the neural network model based on the loss function values.
Optionally, the prediction classification result is multiple;
the parameter adjusting module is specifically used for calculating the result score of each prediction classification result; adding the result scores of all the test classification results to obtain a prediction score of the neural network model; and calculating a loss function value of the neural network model based on the prediction score and the marking score corresponding to the calibration result of the target sample text.
Optionally, the parameter adjusting module is specifically configured to:
the result score for each predicted classification result is calculated using the following formula:
Figure BDA0003198077280000071
Figure BDA0003198077280000072
wherein, PiScoring the outcome of the ith prediction;
Figure BDA0003198077280000073
representing the probability of the accident information category to which the jth character belongs in the ith prediction result;
Figure BDA0003198077280000074
and the jump probability between the accident information category to which the j-1 th character belongs and the accident information category to which the j-th character belongs in the ith prediction result is represented.
Optionally, the parameter adjusting module is specifically configured to calculate a loss function value of the neural network model according to the following formula:
Figure BDA0003198077280000075
wherein LossFunction is the loss function value, PRealPathMarking grade, P, corresponding to the calibration result of the target sample texttotalScoring the prediction.
Optionally, the labeling result corresponding to each sample text is: and marking the sample text by adopting a BIO marking mode to obtain a marking result.
Optionally, the method further comprises: the accident report receiving text generation module is used for generating an accident report receiving text based on an accident report receiving dialogue aiming at the accident to be processed in the following mode, and comprises the following steps: and carrying out voice recognition on the accident reporting conversation aiming at the accident to be processed to generate an accident reporting text.
Optionally, the apparatus further comprises: and the report table generating module is used for executing the information extraction module on the basis of the corresponding relation between the preset accident information category and the text processing operation aiming at each effective text segment in the accident report text, processing the text segment by adopting the information extraction operation corresponding to the accident information category to which the text segment belongs to obtain the accident information of the accident to be processed recorded in the text segment, and then generating an accident information report table on the basis of the obtained accident information.
In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of the first aspect when executing a program stored in the memory.
In a fourth aspect, the present invention provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the method steps of any one of the first aspect.
The embodiment of the invention has the following beneficial effects:
according to the accident information extraction method provided by the embodiment of the invention, the accident report receiving text of the accident to be processed can be obtained, then the accident information classification is carried out on the accident report receiving text by utilizing the pre-trained information classification model, the effective text sections each containing the effective information in the accident report receiving text and the accident information category to which each effective text section belongs are obtained, and the text sections are processed by adopting the information extraction operation corresponding to the accident information category to which the text sections belong according to the preset corresponding relation between the accident information category and the text processing operation aiming at each effective text section in the accident report receiving text, so that the accident information of the accident to be processed recorded in the text sections is visible.
Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.
Fig. 1 is a flowchart of an accident information extraction method according to an embodiment of the present invention;
FIG. 2 is another flow chart of an accident information extraction method according to an embodiment of the present invention;
FIG. 3 is a diagram illustrating an incident report text according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a neural network model provided by an embodiment of the present invention;
FIG. 5 is a schematic diagram of a single LSTM structure provided by an embodiment of the present invention;
FIG. 6 is another flowchart of an accident information extraction method according to an embodiment of the present invention;
FIG. 7 is another flowchart of an accident information extraction method according to an embodiment of the present invention;
FIG. 8 is a diagram illustrating an information report table according to an embodiment of the present invention;
fig. 9 is a schematic structural diagram of an accident information extraction apparatus according to an embodiment of the present invention;
fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In recent years, with the increasing of the social and economic level, social activities are gradually increased, so that the number of various accidents is increased, and adverse effects are brought to the stable and harmonious society. Wherein, divide according to the accident type, the accident of outburst mainly includes the safety in production accident, the transportation accident of industrial and mining trade, building accident on fire etc. and generally has a plurality of accident key elements to constitute, for example: time of occurrence, place of occurrence, number of injured persons, number of dead persons, economic loss, etc.
In order to quickly respond to an emergency, after receiving an emergency call, an operator of the transaction reporting platform needs to analyze and judge the situation of the emergency according to conversation with a reporting person, and determine accident information of the emergency so as to match and start a corresponding emergency plan. However, the daily reporting of accidents in large and medium-sized cities can reach hundreds, and the accidents are determined only by manual processing, so that the efficiency of the accident information determination is low.
In order to improve the efficiency of accident information determination, the embodiment of the invention provides an accident information extraction method and device and electronic equipment.
The embodiment of the invention can be applied to various electronic devices, such as personal computers, servers, mobile phones and other devices with data processing capability. Moreover, the accident information extraction method provided by the embodiment of the invention can be realized in a software, hardware or software and hardware combination mode.
In an embodiment, the electronic device applying the accident information extraction method provided by the invention can be pre-deployed with a pre-trained information classification model. Or, in another embodiment, the pre-trained information classification model may also be deployed in a cloud, and the electronic device applying the accident information extraction method provided by the present invention may communicate with the cloud. In another embodiment, the execution subject of the present invention may be deployed in a transaction report receiving platform, such as a server, a terminal circuit, and the like in the transaction report receiving platform.
The accident information extraction method provided by the embodiment of the invention can comprise the following steps:
acquiring an accident receiving text; wherein, the accident reporting file is as follows: generating a text based on an accident reporting dialog for the accident to be processed;
utilizing a pre-trained information classification model to classify the accident receiving text to obtain effective text sections each containing effective information in the accident receiving text and the accident information category to which each effective text section belongs; wherein, the information classification model is as follows: training based on the sample text and the corresponding labeling result, wherein the labeling result is used for indicating: each effective text segment in the sample text and the accident information category to which each effective text segment belongs;
and aiming at each effective text segment in the accident reporting text, based on the corresponding relation between the preset accident information category and the text processing operation, processing the text segment by adopting the information extraction operation corresponding to the accident information category to which the text segment belongs to obtain the accident information of the to-be-processed accident recorded by the text segment.
According to the technical scheme of the embodiment of the invention, the accident report receiving text of the accident to be processed can be obtained, then the accident information classification is carried out on the accident report receiving text by utilizing the pre-trained information classification model, the effective text sections containing the effective information in the accident report receiving text and the accident information category to which each effective text section belongs are obtained, and the text sections are processed by adopting the information extraction operation corresponding to the accident information category to which the text sections belong according to the corresponding relation between the preset accident information category and the text processing operation aiming at each effective text section in the accident report receiving text, so that the accident information of the accident to be processed recorded in the text sections can be obtained.
The following describes an accident information extraction method provided by an embodiment of the present invention in detail with reference to the accompanying drawings.
As shown in fig. 1, an accident information extraction method provided in an embodiment of the present invention may include the following steps:
s101, acquiring an accident receiving text;
wherein, the accident reporting file is as follows: text generated based on an incident reporting dialog for the incident to be processed.
The accident report receiving conversation can be conversation recording of an accident report receiving platform. The conversation route can be conversation content which is described by the alarm personnel aiming at the accident to be processed after the affair report receiving platform receives the emergency call of the alarm personnel of the accident to be processed. For example: and the XX cell has a fire, and the fire is quickly saved.
In one implementation, the following method may be adopted to generate the accident reporting text based on the accident reporting dialog for the accident to be processed, including:
and carrying out voice recognition on the accident reporting conversation aiming at the accident to be processed to generate an accident reporting text.
The voice content of the accident reporting conversation can be converted into text content through a voice recognition offline SDK (Software Development Kit), so as to generate an accident reporting text. Or the accident reporting receiving conversation can be uploaded to the cloud end in an online voice recognition mode, and the device with the voice recognition function, which is deployed at the cloud end, performs voice recognition on the accident reporting receiving conversation aiming at the accident to be processed to generate an accident reporting receiving text.
In one implementation, the text content generated from the voice content of the accident reporting conversation often includes a multi-person conversation, such as an operator query text, an alarm person statement text, and the like. For the embodiment of the invention, the accident information required to be determined is stated by the alarm personnel, so after the voice recognition is carried out on the accident reporting conversation, the statement texts of the alarm personnel can be combined into a section of conversation to be used as the accident reporting text.
If the accident reporting text is generated by the execution main body of the invention, the generation process of the accident reporting text can be understood as the acquisition process of the accident reporting text. If the accident reporting text is generated by the electronic equipment except the execution main body, the execution main body can acquire the accident reporting text from the electronic equipment which generates the accident reporting text.
S102, utilizing a pre-trained information classification model to classify the accident receiving text to obtain effective text sections each containing effective information in the accident receiving text and the accident information category to which each effective text section belongs; wherein, the information classification model is as follows: training based on the sample text and the corresponding labeling result, wherein the labeling result is used for indicating: each effective text segment in the sample text and the accident information category to which each effective text segment belongs;
the accident information classification may include an accident type and an accident element, wherein the accident type may include 6 types, which are dangerous chemical, mine, traffic, building construction, fire and industry, commerce and trade respectively. And the accident element may include 4 types of elements, which are the occurrence time, the occurrence place, the number of injured persons, and the number of dead persons, respectively.
The effective text segment in the text is a text segment containing effective information, and the effective information can be understood as accident information. For example, the incident report text is: when the number of the stations is just 4, one tank car explodes at the high-speed WL section of the S sea, and casualties exist on the site. In the above-described dialog, "just 4 o 'clock" includes the accident time, "S hai high-speed WL segment" includes the accident site, and "explosion" includes the accident type, so that "just 4 o' clock", "S hai high-speed WL segment", and "explosion" are valid text segments including valid information, and text segments other than the valid text segments are invalid text segments including invalid information.
In order to improve the efficiency of determining the accident information, after the accident text is acquired, the effective text segment in the accident text and the transaction information type to which the effective text segment belongs need to be determined. Based on this, the embodiment of the present invention trains and obtains the information classification model based on the sample text and the corresponding labeling result, where the labeling result is used to indicate: each effective text segment in the sample text and the accident information category to which each effective text segment belongs. Therefore, the information classification model obtained by training can classify the input text and determine the effective text segment in the text and the transaction information type to which the effective text segment belongs. The specific training process will be described later, and will not be described herein.
S103, aiming at each effective text segment in the accident reporting text, based on the corresponding relation between the preset accident information category and the text processing operation, the text segment is processed by adopting the information extraction operation corresponding to the accident information category to which the text segment belongs, and the accident information of the accident to be processed recorded by the text segment is obtained.
Because the text in the text of the accident reporting is biased to spoken language, the accident information contained in the text of the accident reporting is unclear and nonstandard, and therefore, after the effective text segments containing the effective information in the text of the accident reporting and the accident information category to which each effective text segment belongs are determined, the effective text segments need to be further processed, and the accident information of the accident to be processed recorded in the text segments is determined.
The correspondence between the preset accident information category and the text processing operation may be set based on experience and demand.
Illustratively, for the accident time event information category, the corresponding information extraction operation is as follows: the extracted time elements are converted into formats, and general conversations relate to approximate time periods such as "today", "morning", "afternoon", "evening", "around a few minutes", "just", and the like. The accident occurrence time is converted into a format of YYYY-MM-dd hh: MM: ss by using a regular expression.
For another example, for the accident information category of the accident site, the corresponding information extraction operation is as follows: establishing a detailed geographical name table in the district in advance and storing the detailed geographical name table by adopting a tree structure, after a text segment corresponding to an accident place in an accident report text is obtained, carrying out word segmentation by a Chinese word segmentation library, filtering out a place noun according to the part-of-speech tagging function of the result, searching and matching the place noun with the established geographical name table, and finally returning the accident place in a format of XX province XX city XX district XX street XX house number;
for another example, for the event information category of the number of dead people, the corresponding information extraction operation is: and after a text segment corresponding to the number of the dead people in the accident reporting text is obtained, converting the number of the dead people in the text segment into an Arabic numeral format. If the extracted number is in a Chinese format (e.g. twenty-one), a regular expression (numbers [ one, two,... nine ]) + (numbers [ ten, one hundred, thousand, ten thousand, one hundred thousand, one million, ten thousand, one hundred million) is used for matching, and the matching is converted into an Arabic number format;
according to the technical scheme of the embodiment of the invention, the accident report receiving text of the accident to be processed can be obtained, then the accident information classification is carried out on the accident report receiving text by utilizing the pre-trained information classification model, the effective text sections containing the effective information in the accident report receiving text and the accident information category to which each effective text section belongs are obtained, and the text sections are processed by adopting the information extraction operation corresponding to the accident information category to which the text sections belong according to the corresponding relation between the preset accident information category and the text processing operation aiming at each effective text section in the accident report receiving text, so that the accident information of the accident to be processed recorded in the text sections can be obtained.
Based on the embodiment shown in fig. 1, as shown in fig. 2, the embodiment further provides an accident information extraction method, which trains an information classification model according to the following steps:
s201, acquiring a target sample text from a training sample set;
the construction process of the training sample set comprises the following steps: collecting a large number of accident receiving conversations, converting the conversations into sample texts through a voice recognition technology, and storing the sample texts in an initial corpus; further, each sample text in the initial pre-material library is labeled, each effective text segment in the sample text and the accident information category to which each effective text segment belongs are labeled, and a labeling result of the sample text is obtained.
In an implementation manner, the labeling result corresponding to each sample text adopts a BIO-Inside-Outside (Begin-Inside-Outside) labeling manner, and the obtained labeling result is labeled for the sample text. Wherein, the character marked by B is the beginning of a certain effective text segment (accident type or accident element), the character marked by I is the middle or end of the effective text segment, and the character marked by O is the character of the ineffective text segment. For example, as shown in fig. 3, for the case that the accident report text "just 4 o 'clock, there is an explosion of a tank car near the WL segment of the sean high speed, and there is a casualty in the scene," just 4 o' clock "," WL segment of the sean high speed "and" explosion "are valid text segments, the beginning character of each valid text segment is: "just", "S", and "burst", labeled with B, and the middle character: "just 4 dots more", "sea high speed WL section", and "fried", each character is labeled with I, and the other characters are labeled with O. Further, each character is also labeled with the accident information category, such as time, place, type and the like in 3, to which the text segment belongs.
After each sample text is labeled, in one implementation, all labeled sample texts are used as samples in a training sample set, or in another implementation, all labeled sample texts can be divided into two parts according to a preset proportion, one part is used as a training sample set, and the other part is used as a testing sample set. For example, the training sample set accounts for 80% and the testing sample set accounts for 20%.
After the training sample set is determined, the target sample text can be obtained from the training sample set.
S202, inputting the target sample text into a neural network model to be trained, so that the neural network model classifies accident information of the target sample text to obtain a prediction classification result; wherein predicting the classification result comprises: each effective text segment in the target sample text and the accident information category to which each effective text segment belongs;
after the target sample text is obtained, the target sample text can be input into a neural network model to be trained, so that the neural network model can classify the accident information of the target sample text to obtain a prediction classification result. The following embodiments will be further described, and will not be described herein.
S203, adjusting network parameters of the neural network model based on the difference between the prediction classification result and the calibration result of the target sample text; and returning to the step of obtaining the target sample text from the training sample set.
This step will be further described in the following embodiments, which are not described herein again.
According to the scheme of the embodiment of the invention, the accident information determining efficiency can be improved. Furthermore, the neural network model can be trained through the target sample text, and then an information classification model can be obtained. Therefore, when the accident information needs to be extracted, the information classification model can be used for extracting effective text sections containing effective information in the accident reporting text and the accident information class to which each effective text section belongs, and then the accident information is determined based on the effective text sections. Therefore, the scheme provided by the embodiment provides a basis for improving the accident information determination efficiency.
Optionally, in an embodiment, the labeling result corresponding to each sample text is: the accident information category of each character in the effective text segment of the sample text;
in an embodiment, as shown in fig. 4, a schematic diagram of a neural network model provided in an embodiment of the present invention is shown. The neural network model includes: ALBERT, Bi-LSTM and CRF. The ALBERT is a lightweight pre-trained language representation model, and the specific function can be understood as converting a sentence into a vector form with semantic information, namely digitalization. The input of the ALBERT is characters (including Chinese characters, English words, numbers, punctuations and the like) of a news sentence, the content length is not more than 512, and the content is marked as n; the output is a calculated vector for each character, the vector dimension is 128, so the final output is n × 128 (x)1,x2,…,xn)。
The Bi-LSTM is a recurrent neural network, which is composed of 2 x n units, the structure of each unit is completely the same, and n is equal to the length of input data. Each unit consists of an input layer, a hidden layer and an output layer, the output of the first unit is used as the input of the second unit, and the rest is done in the same way until the last unit finishes forward calculation; then, the last unit is sequentially moved forward until the first unit finishes the reverse calculation; and then adding the forward result and the reverse result of the same input data to obtain each output. Illustratively, as shown in fig. 5, a schematic diagram of a single LSTM structure provided in an embodiment of the present invention is shown. The LSTM unit comprises 4 network layers, wherein the activation functions of two network layers are sigmoid functions (sigmoid functions), and the activation functions of the other two network layers are hyperbolic functions (tanh functions). In addition, 3 doors are provided to control the information distribution mode, as shown in FIG. 5
Figure BDA0003198077280000161
And
Figure BDA0003198077280000162
the gate is the most typical characteristic of the LSTM recurrent neural networkThe functions of information retention and noise filtration are achieved. x is the number ofiAs input to the ith cyclic unit, while inputting the unit coefficient ci-1And an activation value ai-1And outputs y after calculationiCoefficient of cell ciActivation value ai,ciAnd aiAnd as the input of the (i + 1) th cycle unit, the whole process is as follows:
Figure BDA0003198077280000163
Figure BDA0003198077280000164
Figure BDA0003198077280000165
Figure BDA0003198077280000166
Figure BDA0003198077280000167
yi=ai
wherein, Wf、Wu、WtWeight coefficients corresponding to the three steps, bf、bu、btIt is the offset factor that is the factor,
Figure BDA0003198077280000168
respectively corresponding to intermediate variables generated in the operation process.
At this time, based on the embodiment shown in fig. 2, as shown in fig. 6, the present embodiment further provides an accident information extraction method, which obtains a prediction classification result by using the following steps, including:
s601, converting each character in the target sample text into a character vector corresponding to the character;
each character within the target sample text is converted by ALBERT into a 128-dimensional character vector corresponding to each character.
S602, sequentially carrying out cyclic processing on the character vectors of the characters based on the character sequence of the characters in the sample text to obtain an initial probability vector of each character; wherein, the numerical value of each dimension in the probability vector of each character is characterized by: a probability that the character belongs to an accident information type corresponding to the dimension;
in one implementation, for each character in the characters, obtaining an initial probability vector for the character by:
acquiring a preceding character feature and a preceding character feature of each character according to a forward character sequence of each character in the sample text; wherein, the characteristic of the preamble is the characteristic extracted based on all the characters in the preamble of the character; the preceding characters are characterized in that: a first character feature extracted based on a character preceding the character;
based on the preceding features, the preceding character features, and the character vector of the character, a first character feature of the character is calculated, and based on the first character feature of the character, an initial probability vector of the character is determined.
Wherein, the forward character sequence of each character in the sample text is the reading sequence of the sample text. Illustratively, the sample text is "a traffic accident occurred at the XX intersection at four points today", and the forward character sequence is "a traffic accident occurred at the XX intersection at four points today", and then characters are sequentially input in this order. Correspondingly, the reverse character sequence is 'car accident occurred hair generation road XX at time point four days and four days', then according to the sequence, the characters are input in turn.
Optionally, in an implementation manner, after calculating the first character feature of the character based on the preamble feature, the previous character feature, and the character vector of the character, the method further includes:
the preamble features are updated based on the preamble features and the character vector for the character.
Optionally, in an implementation manner, each of the obtained results of S601 is usedThe character vector xi corresponding to one character is sequentially used as the input of the Bi-LSTM, and through cyclic calculation, the output vector yi of each LSTM unit is obtained, the dimension size of the yi is 21(6 types of accident and 4 types of accident elements, each type of accident comprises two labels of 'B-' and 'I-', and an 'O' label is added), the meaning of yi is the probability value corresponding to the 21 labels, and the final output of the Bi-LSTM is nx21 (y is the value of the final output of the Bi-LSTM)1,y2,…,yn)。
S603, adjusting the initial probability vector of each character according to the constraint condition; obtaining a target probability vector; the constraint condition is obtained by learning the neural network model through historical training data;
and the constraint condition is obtained by learning the neural network model through historical training data. Optionally, the CRF layer is learned during the training process, for example, B-Label 1I-Label 1 is valid, and B-Label 1I-Label 2 is invalid.
S604, based on the target probability vector of each character, determining the accident information category to which each character belongs as a prediction classification result.
According to the scheme of the embodiment of the invention, the accident information determining efficiency can be improved. Furthermore, the neural network model can be trained through the target sample text, and then an information classification model can be obtained. Therefore, when the accident information needs to be extracted, the information classification model can be used for extracting effective text sections containing effective information in the accident reporting text and the accident information class to which each effective text section belongs, and then the accident information is determined based on the effective text sections. Therefore, the scheme provided by the embodiment provides a basis for improving the accident information determination efficiency.
Optionally, in an embodiment, before determining the initial probability vector of the character based on the first character feature of the character, the method further includes:
acquiring postamble characteristics and postamble characteristics of each character in the sample text according to the reverse character sequence of the characters in the sample text; wherein, the postamble features are extracted features based on all characters of the character postamble; the rear character is characterized in that: a second character feature extracted based on a character subsequent to the character;
calculating a second character feature of the character based on the postamble feature, the postamble feature and the character vector of the character;
at this time, determining an initial probability vector of the character based on the first character feature of the character may include:
an initial probability vector for the character is determined based on a first character feature of the character and a second character feature of the character.
According to the scheme of the embodiment of the invention, the accident information determining efficiency can be improved. Furthermore, the neural network model can be trained through the target sample text, and then an information classification model can be obtained. Therefore, when the accident information needs to be extracted, the information classification model can be used for extracting effective text sections containing effective information in the accident reporting text and the accident information class to which each effective text section belongs, and then the accident information is determined based on the effective text sections. Therefore, the scheme provided by the embodiment provides a basis for improving the accident information determination efficiency.
Based on the embodiment shown in fig. 2, as shown in fig. 7, an embodiment of the present invention further provides a training method for a pupil speckle reduction model, where the adjusting network parameters of the neural network model based on the difference between the prediction classification result and the calibration result of the target sample text in step S203 may include:
s701, calculating a loss function value of the neural network model based on the prediction classification result and the calibration result of the target sample text, wherein the loss function value is used as the difference between the prediction classification result and the calibration result of the target sample text;
optionally, in an embodiment, the number of the predicted classification results is multiple;
calculating a loss function value of the neural network model based on the prediction classification result and the calibration result of the target sample text, which may include:
step 1: calculating a result score for each predicted classification result;
in one implementation, the result score for each predicted classification result is calculated using the following formula:
Figure BDA0003198077280000191
Figure BDA0003198077280000192
wherein, PiScoring the outcome of the ith prediction;
Figure BDA0003198077280000193
representing the probability of the accident information category to which the jth character belongs in the ith prediction result;
Figure BDA0003198077280000194
and the jump probability between the accident information category to which the j-1 th character belongs and the accident information category to which the j-th character belongs in the ith prediction result is represented.
Step 2: adding the result scores of all the test classification results to obtain a prediction score of the neural network model;
and step 3: and calculating a loss function value of the neural network model based on the prediction score and the marking score corresponding to the calibration result of the target sample text.
In one implementation, the loss function value of the neural network model is calculated according to the following formula:
Figure BDA0003198077280000195
wherein LossFunction is the loss function value, PRealPathMarking grade, P, corresponding to the calibration result of the target sample texttotalIs a prediction score.
S702, based on the loss function value, network parameters of the neural network model are adjusted.
According to the scheme of the embodiment of the invention, the accident information determining efficiency can be improved. Furthermore, the neural network model can be trained through the target sample text, and then an information classification model can be obtained. Therefore, when the accident information needs to be extracted, the information classification model can be used for extracting effective text sections containing effective information in the accident reporting text and the accident information class to which each effective text section belongs, and then the accident information is determined based on the effective text sections. Therefore, the scheme provided by the embodiment provides a basis for improving the accident information determination efficiency.
Optionally, in an embodiment, after processing each text segment in the accident reporting text by using an information extraction operation corresponding to the accident information category to which the text segment belongs based on a preset correspondence between the accident information category and the text processing operation to obtain the accident information of the to-be-processed accident recorded in the text segment, the method may further include:
based on the obtained accident information, an accident information report table is generated.
Illustratively, as shown in fig. 8, for an accident information report table provided by an embodiment of the present invention, the extracted accident information is filled into a corresponding position in the information report table. Filling the extracted accident time into a corresponding time text box, filling the extracted accident location into a corresponding location text box, filling the accident type into a corresponding type text box, and generating an accident information report table.
Corresponding to the method provided above, as shown in fig. 9, an embodiment of the present invention further provides an accident information extraction apparatus, including:
a text obtaining module 901, configured to obtain an accident pickup text; wherein, the accident reporting file is as follows: generating a text based on an accident reporting dialog for the accident to be processed;
the information classification module 902 is configured to perform accident information classification on the accident receiving text by using a pre-trained information classification model to obtain effective text segments each including effective information in the accident receiving text and an accident information category to which each effective text segment belongs; wherein, the information classification model is as follows: training based on the sample text and the corresponding labeling result, wherein the labeling result is used for indicating: each effective text segment in the sample text and the accident information category to which each effective text segment belongs;
the information extraction module 903 is configured to, for each valid text segment in the accident reporting text, based on a preset correspondence between the accident information category and the text processing operation, process the text segment by using an information extraction operation corresponding to the accident information category to which the text segment belongs, and obtain accident information of the to-be-processed accident recorded by the text segment.
Optionally, the information classification model is trained according to the following modules:
the text acquisition module is used for acquiring a target sample text from the training sample set;
the text input module is used for inputting the target sample text into the neural network model to be trained so that the neural network model can classify the accident information of the target sample text to obtain a prediction classification result; wherein predicting the classification result comprises: each effective text segment in the target sample text and the accident information category to which each effective text segment belongs;
the parameter adjusting module is used for adjusting network parameters of the neural network model based on the difference between the prediction classification result and the calibration result of the target sample text; and returning to the step of obtaining the target sample text from the training sample set.
Optionally, the labeling result corresponding to each sample text is: the accident information category of each character in the effective text segment of the sample text;
a neural network model, comprising:
the vector conversion module is used for converting each character in the target sample text into a character vector corresponding to the character;
the cyclic processing module is used for sequentially carrying out cyclic processing on the character vectors of the characters based on the character sequence of the characters in the sample text to obtain an initial probability vector of each character; wherein, the numerical value of each dimension in the probability vector of each character is characterized by: a probability that the character belongs to an accident information type corresponding to the dimension;
the probability adjusting module is used for adjusting the initial probability vector of each character according to the constraint condition; obtaining a target probability vector; the constraint condition is obtained by learning the neural network model through historical training data;
and the category prediction module is used for determining the accident information category to which each character belongs based on the target probability vector of each character as a prediction classification result.
Optionally, the loop processing module is specifically configured to, for each character in the characters, obtain an initial probability vector of the character in the following manner, where the method includes: acquiring a preceding character feature and a preceding character feature of each character according to a forward character sequence of each character in the sample text; wherein, the characteristic of the preamble is the characteristic extracted based on all the characters in the preamble of the character; the preceding characters are characterized in that: a first character feature extracted based on a character preceding the character; based on the preceding features, the preceding character features, and the character vector of the character, a first character feature of the character is calculated, and based on the first character feature of the character, an initial probability vector of the character is determined.
Optionally, the loop processing module is further configured to update the preceding feature based on the preceding feature and the character vector of the character after calculating the first character feature of the character based on the preceding feature, the preceding character feature and the character vector of the character.
Optionally, the loop processing module is further configured to, before determining an initial probability vector of the character based on the first character feature of the character, obtain a postamble feature and a postamble feature of the character according to a reverse character order of each character in the sample text; wherein, the postamble features are extracted features based on all characters of the character postamble; the rear character is characterized in that: a second character feature extracted based on a character subsequent to the character; calculating a second character feature of the character based on the postamble feature, the postamble feature and the character vector of the character;
and the loop processing module is specifically used for determining an initial probability vector of the character based on the first character characteristic of the character and the second character characteristic of the character.
Optionally, the parameter adjusting module is specifically configured to calculate a loss function value of the neural network model based on the prediction classification result and the calibration result of the target sample text, as a difference between the prediction classification result and the calibration result of the target sample text; network parameters of the neural network model are adjusted based on the loss function values.
Optionally, the number of the predicted classification results is multiple;
the parameter adjusting module is specifically used for calculating the result score of each prediction classification result; adding the result scores of all the test classification results to obtain a prediction score of the neural network model; and calculating a loss function value of the neural network model based on the prediction score and the marking score corresponding to the calibration result of the target sample text.
Optionally, the parameter adjusting module is specifically configured to:
the result score for each predicted classification result is calculated using the following formula:
Figure BDA0003198077280000221
Figure BDA0003198077280000222
wherein, PiScoring the outcome of the ith prediction;
Figure BDA0003198077280000223
representing the probability of the accident information category to which the jth character belongs in the ith prediction result;
Figure BDA0003198077280000224
and the jump probability between the accident information category to which the j-1 th character belongs and the accident information category to which the j-th character belongs in the ith prediction result is represented.
Optionally, the parameter adjusting module is specifically configured to calculate a loss function value of the neural network model according to the following formula:
Figure BDA0003198077280000225
wherein LossFunction is the loss function value, PRealPathMarking grade, P, corresponding to the calibration result of the target sample texttotalIs a prediction score.
Optionally, the labeling result corresponding to each sample text is: and marking the sample text by adopting a BIO marking mode to obtain a marking result.
Optionally, the method further comprises: the accident report receiving text generation module is used for generating an accident report receiving text based on an accident report receiving dialogue aiming at the accident to be processed in the following mode, and comprises the following steps: and carrying out voice recognition on the accident reporting conversation aiming at the accident to be processed to generate an accident reporting text.
Optionally, the apparatus further comprises: and the report table generating module is used for executing the information extracting module to each effective text segment in the accident report receiving text, processing the text segment by adopting the information extracting operation corresponding to the accident information category to which the text segment belongs based on the preset corresponding relation between the accident information category and the text processing operation, and generating an accident information report table based on the obtained accident information after the accident information of the accident to be processed recorded in the text segment is obtained.
According to the technical scheme of the embodiment of the invention, the accident report receiving text of the accident to be processed can be obtained, then the accident information classification is carried out on the accident report receiving text by utilizing the pre-trained information classification model, the effective text sections containing the effective information in the accident report receiving text and the accident information category to which each effective text section belongs are obtained, and the text sections are processed by adopting the information extraction operation corresponding to the accident information category to which the text sections belong according to the corresponding relation between the preset accident information category and the text processing operation aiming at each effective text section in the accident report receiving text, so that the accident information of the accident to be processed recorded in the text sections can be obtained.
The embodiment of the present invention further provides an electronic device, as shown in fig. 10, which includes a processor 1001, a communication interface 1002, a memory 1003 and a communication bus 1004, wherein the processor 1001, the communication interface 1002 and the memory 1003 complete mutual communication through the communication bus 1004,
a memory 1003 for storing a computer program;
the processor 1001 is configured to implement the above-described procedure of the accident information extraction method when executing the program stored in the memory 1003.
The electronic device of the embodiment of the invention can acquire the accident report receiving text of the accident to be processed, and then classify the accident report receiving text by using the pre-trained information classification model to obtain the effective text sections each containing the effective information in the accident report receiving text and the accident information category to which each effective text section belongs, and process the text sections by using the information extraction operation corresponding to the accident information category to which the text section belongs according to the corresponding relation between the preset accident information category and the text processing operation aiming at each effective text section in the accident report receiving text, so that the accident information of the accident to be processed recorded in the text sections can be obtained.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
In yet another embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any of the above-mentioned accident information extraction methods.
In yet another embodiment, a computer program product containing instructions is provided, which when run on a computer causes the computer to perform any of the above-described incident information extraction methods.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus, device, and system embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (16)

1. An accident information extraction method, characterized in that the method comprises:
acquiring an accident receiving text; wherein, the accident reporting file is as follows: generating a text based on an accident reporting dialog for the accident to be processed;
utilizing a pre-trained information classification model to classify the accident receiving text to obtain effective text sections each containing effective information in the accident receiving text and an accident information category to which each effective text section belongs; wherein, the information classification model is as follows: training based on the sample text and the corresponding labeling result, wherein the labeling result is used for indicating: each effective text segment in the sample text and the accident information category to which each effective text segment belongs;
and aiming at each effective text segment in the accident reporting text, based on the corresponding relation between the preset accident information category and the text processing operation, processing the text segment by adopting the information extraction operation corresponding to the accident information category to which the text segment belongs to obtain the accident information of the accident to be processed recorded by the text segment.
2. The method of claim 1, wherein the information classification model is trained in the following manner:
acquiring a target sample text from the training sample set;
inputting the target sample text into a neural network model to be trained, so that the neural network model classifies accident information of the target sample text to obtain a prediction classification result; wherein the prediction classification result comprises: each effective text segment in the target sample text and the accident information category to which each effective text segment belongs;
adjusting network parameters of the neural network model based on the difference between the prediction classification result and the calibration result of the target sample text; and returning to the step of obtaining the target sample text from the training sample set.
3. The method of claim 2, wherein each sample text corresponds to a labeling result: the accident information category of each character in the effective text segment of the sample text;
the neural network model obtains a prediction classification result in the following mode:
converting each character in the target sample text into a character vector corresponding to the character;
based on the character sequence of each character in the sample text, sequentially carrying out cyclic processing on the character vector of each character to obtain an initial probability vector of each character; wherein, the numerical value of each dimension in the probability vector of each character is characterized by: a probability that the character belongs to an accident information type corresponding to the dimension;
adjusting the initial probability vector of each character according to the constraint condition; obtaining a target probability vector; wherein the constraint condition is obtained by learning the neural network model through historical training data;
and determining the accident information category to which each character belongs as a prediction classification result based on the target probability vector of each character.
4. The method of claim 3, wherein the sequentially performing a cyclic processing on the character vectors of the characters based on the character sequence of the characters in the sample text to obtain an initial probability vector of each character comprises:
for each character in the characters, obtaining an initial probability vector of the character by adopting the following method, wherein the method comprises the following steps:
acquiring the preceding character features and the preceding character features of the characters according to the forward character sequence of each character in the sample text; wherein, the preamble features are extracted features based on all the characters in the preamble of the character; the pre-character features are: a first character feature extracted based on a character preceding the character;
calculating a first character feature of the character based on the preceding feature, the preceding character feature and the character vector of the character, and determining an initial probability vector of the character based on the first character feature of the character.
5. The method of claim 4, further comprising, after said computing the first character feature of the character based on the preceding feature, the preceding character feature, and the character vector of the character:
updating the preceding character feature based on the preceding character feature and the character vector of the character.
6. The method according to claim 4 or 5, wherein before determining the initial probability vector of the character based on the first character feature of the character, further comprising:
acquiring postamble characteristics and postamble characteristics of the characters according to the reverse character sequence of the characters in the sample text; wherein, the postamble features are extracted features based on all characters of the character postamble; the rear character features are: a second character feature extracted based on a character subsequent to the character;
calculating a second character feature of the character based on the postamble feature, the postamble feature and the character vector of the character;
determining an initial probability vector of the character based on the first character feature of the character, including:
an initial probability vector for the character is determined based on a first character feature of the character and a second character feature of the character.
7. The method according to any one of claims 2-5, wherein the adjusting the network parameters of the neural network model based on the difference between the predicted classification result and the calibration result of the target sample text comprises:
calculating a loss function value of the neural network model based on the prediction classification result and the calibration result of the target sample text as a difference between the prediction classification result and the calibration result of the target sample text;
adjusting network parameters of the neural network model based on the loss function values.
8. The method of claim 7, wherein the predicted classification result is plural;
the calculating a loss function value of the neural network model based on the prediction classification result and the calibration result of the target sample text comprises:
calculating a result score for each predicted classification result;
adding the result scores of all the test classification results to obtain a prediction score of the neural network model;
and calculating a loss function value of the neural network model based on the prediction score and the marking score corresponding to the calibration result of the target sample text.
9. The method of claim 8, wherein said calculating a result score for each predictive classification result comprises:
the result score for each predicted classification result is calculated using the following formula:
Figure FDA0003198077270000031
Figure FDA0003198077270000032
wherein, PiScoring the outcome of the ith prediction;
Figure FDA0003198077270000033
representing the probability of the accident information category to which the jth character belongs in the ith prediction result;
Figure FDA0003198077270000041
expressed in the ith prediction resultThe jump probability of the accident information category to which the j-1 character belongs and the accident information category to which the j-th character belongs.
10. The method of claim 8 or 9, wherein calculating the loss function value of the neural network model based on the annotation score corresponding to the calibration result of the target sample text comprises:
calculating a loss function value of the neural network model according to the following formula:
Figure FDA0003198077270000042
wherein LossFunction is the loss function value, PRealPathMarking grade, P, corresponding to the calibration result of the target sample texttotalScoring the prediction.
11. The method of claim 2, wherein each sample text corresponds to a labeling result: and marking the sample text by adopting a BIO marking mode to obtain a marking result.
12. The method of claim 1, wherein generating the incident pickup text based on the incident pickup dialog for the incident to be processed in a manner comprising:
and carrying out voice recognition on the accident reporting conversation aiming at the accident to be processed to generate an accident reporting text.
13. The method according to claim 1, wherein after the text segment is processed by using an information extraction operation corresponding to the accident information category to which the text segment belongs based on a preset correspondence between an accident information category and a text processing operation for each valid text segment in the accident report text to obtain the accident information of the accident to be processed recorded in the text segment, the method further comprises:
based on the obtained accident information, an accident information report table is generated.
14. An accident information extraction apparatus, characterized in that the apparatus comprises:
the text acquisition module is used for acquiring an accident receiving text; wherein, the accident reporting file is as follows: generating a text based on an accident reporting dialog for the accident to be processed;
the information classification module is used for classifying the accident information of the accident receiving text by utilizing a pre-trained information classification model to obtain effective text sections containing effective information in the accident receiving text and the accident information category of each effective text section; wherein, the information classification model is as follows: training based on the sample text and the corresponding labeling result, wherein the labeling result is used for indicating: each effective text segment in the sample text and the accident information category to which each effective text segment belongs;
and the information extraction module is used for processing each effective text segment in the accident reporting text by adopting the information extraction operation corresponding to the accident information category to which the text segment belongs based on the corresponding relation between the preset accident information category and the text processing operation so as to obtain the accident information of the to-be-processed accident recorded by the text segment.
15. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1-13 when executing a program stored in the memory.
16. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 13.
CN202110896545.5A 2021-08-05 2021-08-05 Accident information extraction method and device and electronic equipment Pending CN114385795A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110896545.5A CN114385795A (en) 2021-08-05 2021-08-05 Accident information extraction method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110896545.5A CN114385795A (en) 2021-08-05 2021-08-05 Accident information extraction method and device and electronic equipment

Publications (1)

Publication Number Publication Date
CN114385795A true CN114385795A (en) 2022-04-22

Family

ID=81194509

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110896545.5A Pending CN114385795A (en) 2021-08-05 2021-08-05 Accident information extraction method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN114385795A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908614A (en) * 2017-10-12 2018-04-13 北京知道未来信息技术有限公司 A kind of name entity recognition method based on Bi LSTM
CN110765265A (en) * 2019-09-06 2020-02-07 平安科技(深圳)有限公司 Information classification extraction method and device, computer equipment and storage medium
CN111428981A (en) * 2020-03-18 2020-07-17 国电南瑞科技股份有限公司 Deep learning-based power grid fault plan information extraction method and system
US20200364407A1 (en) * 2019-05-14 2020-11-19 Korea University Research And Business Foundation Method and server for text classification using multi-task learning
CN112269949A (en) * 2020-10-19 2021-01-26 杭州叙简科技股份有限公司 Information structuring method based on accident disaster news

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107908614A (en) * 2017-10-12 2018-04-13 北京知道未来信息技术有限公司 A kind of name entity recognition method based on Bi LSTM
US20200364407A1 (en) * 2019-05-14 2020-11-19 Korea University Research And Business Foundation Method and server for text classification using multi-task learning
CN110765265A (en) * 2019-09-06 2020-02-07 平安科技(深圳)有限公司 Information classification extraction method and device, computer equipment and storage medium
CN111428981A (en) * 2020-03-18 2020-07-17 国电南瑞科技股份有限公司 Deep learning-based power grid fault plan information extraction method and system
CN112269949A (en) * 2020-10-19 2021-01-26 杭州叙简科技股份有限公司 Information structuring method based on accident disaster news

Similar Documents

Publication Publication Date Title
CN107729309B (en) Deep learning-based Chinese semantic analysis method and device
CN106991085B (en) Entity abbreviation generation method and device
CN112100354B (en) Man-machine conversation method, device, equipment and storage medium
CN104820629A (en) Intelligent system and method for emergently processing public sentiment emergency
CN113688221B (en) Model-based conversation recommendation method, device, computer equipment and storage medium
CN110134950B (en) Automatic text proofreading method combining words
Bokka et al. Deep Learning for Natural Language Processing: Solve your natural language processing problems with smart deep neural networks
CN111414746A (en) Matching statement determination method, device, equipment and storage medium
CN113919366A (en) Semantic matching method and device for power transformer knowledge question answering
CN112395391B (en) Concept graph construction method, device, computer equipment and storage medium
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN115687563A (en) Interpretable intelligent judgment method and device, electronic equipment and storage medium
CN115273815A (en) Method, device and equipment for detecting voice keywords and storage medium
CN116050425A (en) Method for establishing pre-training language model, text prediction method and device
CN114491023A (en) Text processing method and device, electronic equipment and storage medium
CN112183060B (en) Reference resolution method of multi-round dialogue system
CN117114707A (en) Training method, prediction method and device for risk-escape anti-fraud prediction model
CN109635289B (en) Entry classification method and audit information extraction method
CN116702765A (en) Event extraction method and device and electronic equipment
CN115376547B (en) Pronunciation evaluation method, pronunciation evaluation device, computer equipment and storage medium
WO2023137903A1 (en) Reply statement determination method and apparatus based on rough semantics, and electronic device
CN113627197B (en) Text intention recognition method, device, equipment and storage medium
CN114385795A (en) Accident information extraction method and device and electronic equipment
CN115906824A (en) Text fine-grained emotion analysis method, system, medium and computing equipment
CN114861626A (en) Traffic warning condition processing method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 100013 no.a4, Hepingli District 9, Dongcheng District, Beijing

Applicant after: Big data center of emergency management department

Address before: 100013 no.a4, Hepingli District 9, Dongcheng District, Beijing

Applicant before: Communication and information center of emergency management department

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220422