CN114385795A

CN114385795A - Accident information extraction method and device and electronic equipment

Info

Publication number: CN114385795A
Application number: CN202110896545.5A
Authority: CN
Inventors: 杨继星; 房玉东; 邢晓毅; 柳树林; 张卫伟; 边路; 狄瑞晟
Original assignee: Communication And Information Center Of Emergency Management Department
Current assignee: Communication And Information Center Of Emergency Management Department
Priority date: 2021-08-05
Filing date: 2021-08-05
Publication date: 2022-04-22

Abstract

The embodiment of the invention provides an accident information extraction method, an accident information extraction device and electronic equipment, and is applied to the technical field of emergency management. The method comprises the following steps: acquiring an accident receiving text; wherein, the accident reporting file is as follows: generating a text based on an accident reporting dialog for the accident to be processed; utilizing a pre-trained information classification model to classify the accident receiving text to obtain effective text sections each containing effective information in the accident receiving text and the accident information category to which each effective text section belongs; and aiming at each effective text segment in the accident reporting text, based on the corresponding relation between the preset accident information category and the text processing operation, processing the text segment by adopting the information extraction operation corresponding to the accident information category to which the text segment belongs to obtain the accident information of the to-be-processed accident recorded by the text segment. Through the scheme, the efficiency of determining the accident information can be improved.

Description

Accident information extraction method and device and electronic equipment

Technical Field

The invention relates to the technical field of emergency management, in particular to an accident information extraction method and device and electronic equipment.

Background

In recent years, with the increasing of the social and economic level, social activities are gradually increased, so that the number of various accidents is increased, and adverse effects are brought to the stable and harmonious society. Wherein, divide according to the accident type, the accident of outburst mainly includes the safety in production accident, the transportation accident of industrial and mining trade, building accident on fire etc. and generally has a plurality of accident key elements to constitute, for example: time of occurrence, place of occurrence, number of injured persons, number of dead persons, economic loss, etc.

In order to quickly respond to an emergency, after receiving an emergency call, an operator of the transaction reporting platform needs to analyze and judge the situation of the emergency according to conversation with a reporting person, and determine accident information of the emergency so as to match and start a corresponding emergency plan. However, the daily reporting of accidents in large and medium-sized cities can reach hundreds, and the accidents are determined only by manual processing, so that the efficiency of the accident information determination is low.

Disclosure of Invention

The embodiment of the invention aims to provide an accident information extraction method, an accident information extraction device and electronic equipment so as to improve the accident information determination efficiency. The specific technical scheme is as follows:

in a first aspect, an embodiment of the present invention provides an accident information extraction method, where the method includes:

acquiring an accident receiving text; wherein, the accident reporting file is as follows: generating a text based on an accident reporting dialog for the accident to be processed;

utilizing a pre-trained information classification model to classify the accident receiving text to obtain effective text sections each containing effective information in the accident receiving text and an accident information category to which each effective text section belongs; wherein, the information classification model is as follows: training based on the sample text and the corresponding labeling result, wherein the labeling result is used for indicating: each effective text segment in the sample text and the accident information category to which each effective text segment belongs;

and aiming at each effective text segment in the accident reporting text, based on the corresponding relation between the preset accident information category and the text processing operation, processing the text segment by adopting the information extraction operation corresponding to the accident information category to which the text segment belongs to obtain the accident information of the accident to be processed recorded by the text segment.

Optionally, the information classification model is trained according to the following method:

acquiring a target sample text from the training sample set;

inputting the target sample text into a neural network model to be trained, so that the neural network model classifies accident information of the target sample text to obtain a prediction classification result; wherein the prediction classification result comprises: each effective text segment in the target sample text and the accident information category to which each effective text segment belongs;

adjusting network parameters of the neural network model based on the difference between the prediction classification result and the calibration result of the target sample text; and returning to the step of obtaining the target sample text from the training sample set.

Optionally, the labeling result corresponding to each sample text is: the accident information category of each character in the effective text segment of the sample text;

the neural network model obtains a prediction classification result in the following mode:

converting each character in the target sample text into a character vector corresponding to the character;

based on the character sequence of each character in the sample text, sequentially carrying out cyclic processing on the character vector of each character to obtain an initial probability vector of each character; wherein, the numerical value of each dimension in the probability vector of each character is characterized by: a probability that the character belongs to an accident information type corresponding to the dimension;

adjusting the initial probability vector of each character according to the constraint condition; obtaining a target probability vector; wherein the constraint condition is obtained by learning the neural network model through historical training data;

and determining the accident information category to which each character belongs as a prediction classification result based on the target probability vector of each character.

Optionally, the performing, in sequence, cyclic processing on the character vectors of the characters based on the character sequence of the characters in the sample text to obtain an initial probability vector of each character includes:

for each character in the characters, obtaining an initial probability vector of the character by adopting the following method, wherein the method comprises the following steps:

acquiring the preceding character features and the preceding character features of the characters according to the forward character sequence of each character in the sample text; wherein, the preamble features are extracted features based on all the characters in the preamble of the character; the pre-character features are: a first character feature extracted based on a character preceding the character;

calculating a first character feature of the character based on the preceding feature, the preceding character feature and the character vector of the character, and determining an initial probability vector of the character based on the first character feature of the character.

Optionally, after the calculating a first character feature of the character based on the previous feature, the previous character feature, and the character vector of the character, the method further includes:

updating the preceding character feature based on the preceding character feature and the character vector of the character.

Optionally, before determining the initial probability vector of the character based on the first character feature of the character, the method further includes:

acquiring postamble characteristics and postamble characteristics of the characters according to the reverse character sequence of the characters in the sample text; wherein, the postamble features are extracted features based on all characters of the character postamble; the rear character features are: a second character feature extracted based on a character subsequent to the character;

calculating a second character feature of the character based on the postamble feature, the postamble feature and the character vector of the character;

determining an initial probability vector of the character based on the first character feature of the character, including:

an initial probability vector for the character is determined based on a first character feature of the character and a second character feature of the character.

Optionally, the adjusting the network parameters of the neural network model based on the difference between the prediction classification result and the calibration result of the target sample text includes:

calculating a loss function value of the neural network model based on the prediction classification result and the calibration result of the target sample text as a difference between the prediction classification result and the calibration result of the target sample text;

adjusting network parameters of the neural network model based on the loss function values.

Optionally, the number of the prediction classification results is multiple;

the calculating a loss function value of the neural network model based on the prediction classification result and the calibration result of the target sample text comprises:

calculating a result score for each predicted classification result;

adding the result scores of all the test classification results to obtain a prediction score of the neural network model;

and calculating a loss function value of the neural network model based on the prediction score and the marking score corresponding to the calibration result of the target sample text.

Optionally, the calculating the result score of each predicted classification result includes:

the result score for each predicted classification result is calculated using the following formula:

wherein, P_iScoring the outcome of the ith prediction;

representing the probability of the accident information category to which the jth character belongs in the ith prediction result;

and the jump probability between the accident information category to which the j-1 th character belongs and the accident information category to which the j-th character belongs in the ith prediction result is represented.

Optionally, the calculating a loss function value of the neural network model based on the prediction score and the annotation score corresponding to the calibration result of the target sample text includes:

calculating a loss function value of the neural network model according to the following formula:

wherein LossFunction is the loss function value, P_RealPathMarking grade, P, corresponding to the calibration result of the target sample text_totalScoring the prediction.

Optionally, the labeling result corresponding to each sample text is: and marking the sample text by adopting a BIO marking mode to obtain a marking result.

Optionally, the following method is adopted, and the generating of the accident report receiving text based on the accident report receiving dialogue for the accident to be processed includes:

and carrying out voice recognition on the accident reporting conversation aiming at the accident to be processed to generate an accident reporting text.

Optionally, after the information extraction operation corresponding to the accident information category to which the text segment belongs is adopted to process the text segment based on the preset corresponding relationship between the accident information category and the text processing operation for each effective text segment in the accident reporting text, and the accident information of the accident to be processed recorded in the text segment is obtained, the method further includes:

based on the obtained accident information, an accident information report table is generated.

In a second aspect, an embodiment of the present invention provides an accident information extraction apparatus, where the apparatus includes:

the text acquisition module is used for acquiring an accident receiving text; wherein, the accident reporting file is as follows: generating a text based on an accident reporting dialog for the accident to be processed;

the information classification module is used for classifying the accident information of the accident receiving text by utilizing a pre-trained information classification model to obtain effective text sections containing effective information in the accident receiving text and the accident information category of each effective text section; wherein, the information classification model is as follows: training based on the sample text and the corresponding labeling result, wherein the labeling result is used for indicating: each effective text segment in the sample text and the accident information category to which each effective text segment belongs;

and the information extraction module is used for processing each effective text segment in the accident reporting text by adopting the information extraction operation corresponding to the accident information category to which the text segment belongs based on the corresponding relation between the preset accident information category and the text processing operation so as to obtain the accident information of the to-be-processed accident recorded by the text segment.

Optionally, the information classification model is trained according to the following modules:

the text acquisition module is used for acquiring a target sample text from the training sample set;

the text input module is used for inputting the target sample text into a neural network model to be trained so that the neural network model can classify the accident information of the target sample text to obtain a prediction classification result; wherein the prediction classification result comprises: each effective text segment in the target sample text and the accident information category to which each effective text segment belongs;

the parameter adjusting module is used for adjusting network parameters of the neural network model based on the difference between the prediction classification result and the calibration result of the target sample text; and returning to the step of obtaining the target sample text from the training sample set.

the neural network model comprises:

the vector conversion module is used for converting each character in the target sample text into a character vector corresponding to the character;

the cyclic processing module is used for sequentially carrying out cyclic processing on the character vectors of the characters based on the character sequence of the characters in the sample text to obtain an initial probability vector of each character; wherein, the numerical value of each dimension in the probability vector of each character is characterized by: a probability that the character belongs to an accident information type corresponding to the dimension;

the probability adjusting module is used for adjusting the initial probability vector of each character according to the constraint condition; obtaining a target probability vector; wherein the constraint condition is obtained by learning the neural network model through historical training data;

and the class prediction module is used for determining the accident information class to which each character belongs based on the target probability vector of each character as a prediction classification result.

Optionally, the loop processing module is specifically configured to, for each character in the characters, obtain an initial probability vector of the character in the following manner, where the method includes: acquiring the preceding character features and the preceding character features of the characters according to the forward character sequence of each character in the sample text; wherein, the preamble features are extracted features based on all the characters in the preamble of the character; the pre-character features are: a first character feature extracted based on a character preceding the character; calculating a first character feature of the character based on the preceding feature, the preceding character feature and the character vector of the character, and determining an initial probability vector of the character based on the first character feature of the character.

Optionally, the loop processing module is further configured to, after the first character feature of the character is calculated based on the preceding feature, the preceding character feature and the character vector of the character, update the preceding feature based on the preceding character feature and the character vector of the character.

Optionally, the loop processing module is further configured to, before determining the initial probability vector of the character based on the first character feature of the character, obtain a postamble feature and a postamble feature of the character according to a reverse character sequence of each character in the sample text; wherein, the postamble features are extracted features based on all characters of the character postamble; the rear character features are: a second character feature extracted based on a character subsequent to the character; calculating a second character feature of the character based on the postamble feature, the postamble feature and the character vector of the character;

the loop processing module is specifically configured to determine an initial probability vector of the character based on a first character feature of the character and a second character feature of the character.

Optionally, the parameter adjusting module is specifically configured to calculate a loss function value of the neural network model based on the prediction classification result and the calibration result of the target sample text, as a difference between the prediction classification result and the calibration result of the target sample text; adjusting network parameters of the neural network model based on the loss function values.

Optionally, the prediction classification result is multiple;

the parameter adjusting module is specifically used for calculating the result score of each prediction classification result; adding the result scores of all the test classification results to obtain a prediction score of the neural network model; and calculating a loss function value of the neural network model based on the prediction score and the marking score corresponding to the calibration result of the target sample text.

Optionally, the parameter adjusting module is specifically configured to:

wherein, P_iScoring the outcome of the ith prediction;

Optionally, the parameter adjusting module is specifically configured to calculate a loss function value of the neural network model according to the following formula:

Optionally, the method further comprises: the accident report receiving text generation module is used for generating an accident report receiving text based on an accident report receiving dialogue aiming at the accident to be processed in the following mode, and comprises the following steps: and carrying out voice recognition on the accident reporting conversation aiming at the accident to be processed to generate an accident reporting text.

Optionally, the apparatus further comprises: and the report table generating module is used for executing the information extraction module on the basis of the corresponding relation between the preset accident information category and the text processing operation aiming at each effective text segment in the accident report text, processing the text segment by adopting the information extraction operation corresponding to the accident information category to which the text segment belongs to obtain the accident information of the accident to be processed recorded in the text segment, and then generating an accident information report table on the basis of the obtained accident information.

In a third aspect, an embodiment of the present invention provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor and the communication interface complete communication between the memory and the processor through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of the first aspect when executing a program stored in the memory.

In a fourth aspect, the present invention provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the method steps of any one of the first aspect.

The embodiment of the invention has the following beneficial effects:

according to the accident information extraction method provided by the embodiment of the invention, the accident report receiving text of the accident to be processed can be obtained, then the accident information classification is carried out on the accident report receiving text by utilizing the pre-trained information classification model, the effective text sections each containing the effective information in the accident report receiving text and the accident information category to which each effective text section belongs are obtained, and the text sections are processed by adopting the information extraction operation corresponding to the accident information category to which the text sections belong according to the preset corresponding relation between the accident information category and the text processing operation aiming at each effective text section in the accident report receiving text, so that the accident information of the accident to be processed recorded in the text sections is visible.

Of course, not all of the advantages described above need to be achieved at the same time in the practice of any one product or method of the invention.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other embodiments can be obtained by using the drawings without creative efforts.

Fig. 1 is a flowchart of an accident information extraction method according to an embodiment of the present invention;

FIG. 2 is another flow chart of an accident information extraction method according to an embodiment of the present invention;

FIG. 3 is a diagram illustrating an incident report text according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a neural network model provided by an embodiment of the present invention;

FIG. 5 is a schematic diagram of a single LSTM structure provided by an embodiment of the present invention;

FIG. 6 is another flowchart of an accident information extraction method according to an embodiment of the present invention;

FIG. 7 is another flowchart of an accident information extraction method according to an embodiment of the present invention;

FIG. 8 is a diagram illustrating an information report table according to an embodiment of the present invention;

fig. 9 is a schematic structural diagram of an accident information extraction apparatus according to an embodiment of the present invention;

fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to improve the efficiency of accident information determination, the embodiment of the invention provides an accident information extraction method and device and electronic equipment.

The embodiment of the invention can be applied to various electronic devices, such as personal computers, servers, mobile phones and other devices with data processing capability. Moreover, the accident information extraction method provided by the embodiment of the invention can be realized in a software, hardware or software and hardware combination mode.

In an embodiment, the electronic device applying the accident information extraction method provided by the invention can be pre-deployed with a pre-trained information classification model. Or, in another embodiment, the pre-trained information classification model may also be deployed in a cloud, and the electronic device applying the accident information extraction method provided by the present invention may communicate with the cloud. In another embodiment, the execution subject of the present invention may be deployed in a transaction report receiving platform, such as a server, a terminal circuit, and the like in the transaction report receiving platform.

The accident information extraction method provided by the embodiment of the invention can comprise the following steps:

utilizing a pre-trained information classification model to classify the accident receiving text to obtain effective text sections each containing effective information in the accident receiving text and the accident information category to which each effective text section belongs; wherein, the information classification model is as follows: training based on the sample text and the corresponding labeling result, wherein the labeling result is used for indicating: each effective text segment in the sample text and the accident information category to which each effective text segment belongs;

and aiming at each effective text segment in the accident reporting text, based on the corresponding relation between the preset accident information category and the text processing operation, processing the text segment by adopting the information extraction operation corresponding to the accident information category to which the text segment belongs to obtain the accident information of the to-be-processed accident recorded by the text segment.

According to the technical scheme of the embodiment of the invention, the accident report receiving text of the accident to be processed can be obtained, then the accident information classification is carried out on the accident report receiving text by utilizing the pre-trained information classification model, the effective text sections containing the effective information in the accident report receiving text and the accident information category to which each effective text section belongs are obtained, and the text sections are processed by adopting the information extraction operation corresponding to the accident information category to which the text sections belong according to the corresponding relation between the preset accident information category and the text processing operation aiming at each effective text section in the accident report receiving text, so that the accident information of the accident to be processed recorded in the text sections can be obtained.

The following describes an accident information extraction method provided by an embodiment of the present invention in detail with reference to the accompanying drawings.

As shown in fig. 1, an accident information extraction method provided in an embodiment of the present invention may include the following steps:

s101, acquiring an accident receiving text;

wherein, the accident reporting file is as follows: text generated based on an incident reporting dialog for the incident to be processed.

The accident report receiving conversation can be conversation recording of an accident report receiving platform. The conversation route can be conversation content which is described by the alarm personnel aiming at the accident to be processed after the affair report receiving platform receives the emergency call of the alarm personnel of the accident to be processed. For example: and the XX cell has a fire, and the fire is quickly saved.

In one implementation, the following method may be adopted to generate the accident reporting text based on the accident reporting dialog for the accident to be processed, including:

The voice content of the accident reporting conversation can be converted into text content through a voice recognition offline SDK (Software Development Kit), so as to generate an accident reporting text. Or the accident reporting receiving conversation can be uploaded to the cloud end in an online voice recognition mode, and the device with the voice recognition function, which is deployed at the cloud end, performs voice recognition on the accident reporting receiving conversation aiming at the accident to be processed to generate an accident reporting receiving text.

In one implementation, the text content generated from the voice content of the accident reporting conversation often includes a multi-person conversation, such as an operator query text, an alarm person statement text, and the like. For the embodiment of the invention, the accident information required to be determined is stated by the alarm personnel, so after the voice recognition is carried out on the accident reporting conversation, the statement texts of the alarm personnel can be combined into a section of conversation to be used as the accident reporting text.

If the accident reporting text is generated by the execution main body of the invention, the generation process of the accident reporting text can be understood as the acquisition process of the accident reporting text. If the accident reporting text is generated by the electronic equipment except the execution main body, the execution main body can acquire the accident reporting text from the electronic equipment which generates the accident reporting text.

S102, utilizing a pre-trained information classification model to classify the accident receiving text to obtain effective text sections each containing effective information in the accident receiving text and the accident information category to which each effective text section belongs; wherein, the information classification model is as follows: training based on the sample text and the corresponding labeling result, wherein the labeling result is used for indicating: each effective text segment in the sample text and the accident information category to which each effective text segment belongs;

the accident information classification may include an accident type and an accident element, wherein the accident type may include 6 types, which are dangerous chemical, mine, traffic, building construction, fire and industry, commerce and trade respectively. And the accident element may include 4 types of elements, which are the occurrence time, the occurrence place, the number of injured persons, and the number of dead persons, respectively.

The effective text segment in the text is a text segment containing effective information, and the effective information can be understood as accident information. For example, the incident report text is: when the number of the stations is just 4, one tank car explodes at the high-speed WL section of the S sea, and casualties exist on the site. In the above-described dialog, "just 4 o 'clock" includes the accident time, "S hai high-speed WL segment" includes the accident site, and "explosion" includes the accident type, so that "just 4 o' clock", "S hai high-speed WL segment", and "explosion" are valid text segments including valid information, and text segments other than the valid text segments are invalid text segments including invalid information.

In order to improve the efficiency of determining the accident information, after the accident text is acquired, the effective text segment in the accident text and the transaction information type to which the effective text segment belongs need to be determined. Based on this, the embodiment of the present invention trains and obtains the information classification model based on the sample text and the corresponding labeling result, where the labeling result is used to indicate: each effective text segment in the sample text and the accident information category to which each effective text segment belongs. Therefore, the information classification model obtained by training can classify the input text and determine the effective text segment in the text and the transaction information type to which the effective text segment belongs. The specific training process will be described later, and will not be described herein.

S103, aiming at each effective text segment in the accident reporting text, based on the corresponding relation between the preset accident information category and the text processing operation, the text segment is processed by adopting the information extraction operation corresponding to the accident information category to which the text segment belongs, and the accident information of the accident to be processed recorded by the text segment is obtained.

Because the text in the text of the accident reporting is biased to spoken language, the accident information contained in the text of the accident reporting is unclear and nonstandard, and therefore, after the effective text segments containing the effective information in the text of the accident reporting and the accident information category to which each effective text segment belongs are determined, the effective text segments need to be further processed, and the accident information of the accident to be processed recorded in the text segments is determined.

The correspondence between the preset accident information category and the text processing operation may be set based on experience and demand.

Illustratively, for the accident time event information category, the corresponding information extraction operation is as follows: the extracted time elements are converted into formats, and general conversations relate to approximate time periods such as "today", "morning", "afternoon", "evening", "around a few minutes", "just", and the like. The accident occurrence time is converted into a format of YYYY-MM-dd hh: MM: ss by using a regular expression.

For another example, for the accident information category of the accident site, the corresponding information extraction operation is as follows: establishing a detailed geographical name table in the district in advance and storing the detailed geographical name table by adopting a tree structure, after a text segment corresponding to an accident place in an accident report text is obtained, carrying out word segmentation by a Chinese word segmentation library, filtering out a place noun according to the part-of-speech tagging function of the result, searching and matching the place noun with the established geographical name table, and finally returning the accident place in a format of XX province XX city XX district XX street XX house number;

for another example, for the event information category of the number of dead people, the corresponding information extraction operation is: and after a text segment corresponding to the number of the dead people in the accident reporting text is obtained, converting the number of the dead people in the text segment into an Arabic numeral format. If the extracted number is in a Chinese format (e.g. twenty-one), a regular expression (numbers [ one, two,... nine ]) + (numbers [ ten, one hundred, thousand, ten thousand, one hundred thousand, one million, ten thousand, one hundred million) is used for matching, and the matching is converted into an Arabic number format;

Based on the embodiment shown in fig. 1, as shown in fig. 2, the embodiment further provides an accident information extraction method, which trains an information classification model according to the following steps:

s201, acquiring a target sample text from a training sample set;

the construction process of the training sample set comprises the following steps: collecting a large number of accident receiving conversations, converting the conversations into sample texts through a voice recognition technology, and storing the sample texts in an initial corpus; further, each sample text in the initial pre-material library is labeled, each effective text segment in the sample text and the accident information category to which each effective text segment belongs are labeled, and a labeling result of the sample text is obtained.

In an implementation manner, the labeling result corresponding to each sample text adopts a BIO-Inside-Outside (Begin-Inside-Outside) labeling manner, and the obtained labeling result is labeled for the sample text. Wherein, the character marked by B is the beginning of a certain effective text segment (accident type or accident element), the character marked by I is the middle or end of the effective text segment, and the character marked by O is the character of the ineffective text segment. For example, as shown in fig. 3, for the case that the accident report text "just 4 o 'clock, there is an explosion of a tank car near the WL segment of the sean high speed, and there is a casualty in the scene," just 4 o' clock "," WL segment of the sean high speed "and" explosion "are valid text segments, the beginning character of each valid text segment is: "just", "S", and "burst", labeled with B, and the middle character: "just 4 dots more", "sea high speed WL section", and "fried", each character is labeled with I, and the other characters are labeled with O. Further, each character is also labeled with the accident information category, such as time, place, type and the like in 3, to which the text segment belongs.

After each sample text is labeled, in one implementation, all labeled sample texts are used as samples in a training sample set, or in another implementation, all labeled sample texts can be divided into two parts according to a preset proportion, one part is used as a training sample set, and the other part is used as a testing sample set. For example, the training sample set accounts for 80% and the testing sample set accounts for 20%.

After the training sample set is determined, the target sample text can be obtained from the training sample set.

S202, inputting the target sample text into a neural network model to be trained, so that the neural network model classifies accident information of the target sample text to obtain a prediction classification result; wherein predicting the classification result comprises: each effective text segment in the target sample text and the accident information category to which each effective text segment belongs;

after the target sample text is obtained, the target sample text can be input into a neural network model to be trained, so that the neural network model can classify the accident information of the target sample text to obtain a prediction classification result. The following embodiments will be further described, and will not be described herein.

S203, adjusting network parameters of the neural network model based on the difference between the prediction classification result and the calibration result of the target sample text; and returning to the step of obtaining the target sample text from the training sample set.

This step will be further described in the following embodiments, which are not described herein again.

According to the scheme of the embodiment of the invention, the accident information determining efficiency can be improved. Furthermore, the neural network model can be trained through the target sample text, and then an information classification model can be obtained. Therefore, when the accident information needs to be extracted, the information classification model can be used for extracting effective text sections containing effective information in the accident reporting text and the accident information class to which each effective text section belongs, and then the accident information is determined based on the effective text sections. Therefore, the scheme provided by the embodiment provides a basis for improving the accident information determination efficiency.

Optionally, in an embodiment, the labeling result corresponding to each sample text is: the accident information category of each character in the effective text segment of the sample text;

in an embodiment, as shown in fig. 4, a schematic diagram of a neural network model provided in an embodiment of the present invention is shown. The neural network model includes: ALBERT, Bi-LSTM and CRF. The ALBERT is a lightweight pre-trained language representation model, and the specific function can be understood as converting a sentence into a vector form with semantic information, namely digitalization. The input of the ALBERT is characters (including Chinese characters, English words, numbers, punctuations and the like) of a news sentence, the content length is not more than 512, and the content is marked as n; the output is a calculated vector for each character, the vector dimension is 128, so the final output is n × 128 (x)₁,x₂,…,x_n)。

The Bi-LSTM is a recurrent neural network, which is composed of 2 x n units, the structure of each unit is completely the same, and n is equal to the length of input data. Each unit consists of an input layer, a hidden layer and an output layer, the output of the first unit is used as the input of the second unit, and the rest is done in the same way until the last unit finishes forward calculation; then, the last unit is sequentially moved forward until the first unit finishes the reverse calculation; and then adding the forward result and the reverse result of the same input data to obtain each output. Illustratively, as shown in fig. 5, a schematic diagram of a single LSTM structure provided in an embodiment of the present invention is shown. The LSTM unit comprises 4 network layers, wherein the activation functions of two network layers are sigmoid functions (sigmoid functions), and the activation functions of the other two network layers are hyperbolic functions (tanh functions). In addition, 3 doors are provided to control the information distribution mode, as shown in FIG. 5

And

the gate is the most typical characteristic of the LSTM recurrent neural networkThe functions of information retention and noise filtration are achieved. x is the number ofⁱAs input to the ith cyclic unit, while inputting the unit coefficient c^i-1And an activation value a^i-1And outputs y after calculationⁱCoefficient of cell cⁱActivation value aⁱ，cⁱAnd aⁱAnd as the input of the (i + 1) th cycle unit, the whole process is as follows:

yⁱ＝aⁱ

wherein, W_f、W_u、W_tWeight coefficients corresponding to the three steps, b_f、b_u、b_tIt is the offset factor that is the factor,

respectively corresponding to intermediate variables generated in the operation process.

At this time, based on the embodiment shown in fig. 2, as shown in fig. 6, the present embodiment further provides an accident information extraction method, which obtains a prediction classification result by using the following steps, including:

s601, converting each character in the target sample text into a character vector corresponding to the character;

each character within the target sample text is converted by ALBERT into a 128-dimensional character vector corresponding to each character.

S602, sequentially carrying out cyclic processing on the character vectors of the characters based on the character sequence of the characters in the sample text to obtain an initial probability vector of each character; wherein, the numerical value of each dimension in the probability vector of each character is characterized by: a probability that the character belongs to an accident information type corresponding to the dimension;

in one implementation, for each character in the characters, obtaining an initial probability vector for the character by:

acquiring a preceding character feature and a preceding character feature of each character according to a forward character sequence of each character in the sample text; wherein, the characteristic of the preamble is the characteristic extracted based on all the characters in the preamble of the character; the preceding characters are characterized in that: a first character feature extracted based on a character preceding the character;

based on the preceding features, the preceding character features, and the character vector of the character, a first character feature of the character is calculated, and based on the first character feature of the character, an initial probability vector of the character is determined.

Wherein, the forward character sequence of each character in the sample text is the reading sequence of the sample text. Illustratively, the sample text is "a traffic accident occurred at the XX intersection at four points today", and the forward character sequence is "a traffic accident occurred at the XX intersection at four points today", and then characters are sequentially input in this order. Correspondingly, the reverse character sequence is 'car accident occurred hair generation road XX at time point four days and four days', then according to the sequence, the characters are input in turn.

Optionally, in an implementation manner, after calculating the first character feature of the character based on the preamble feature, the previous character feature, and the character vector of the character, the method further includes:

the preamble features are updated based on the preamble features and the character vector for the character.

Optionally, in an implementation manner, each of the obtained results of S601 is usedThe character vector xi corresponding to one character is sequentially used as the input of the Bi-LSTM, and through cyclic calculation, the output vector yi of each LSTM unit is obtained, the dimension size of the yi is 21(6 types of accident and 4 types of accident elements, each type of accident comprises two labels of 'B-' and 'I-', and an 'O' label is added), the meaning of yi is the probability value corresponding to the 21 labels, and the final output of the Bi-LSTM is nx21 (y is the value of the final output of the Bi-LSTM)₁,y₂,…,y_n)。

S603, adjusting the initial probability vector of each character according to the constraint condition; obtaining a target probability vector; the constraint condition is obtained by learning the neural network model through historical training data;

and the constraint condition is obtained by learning the neural network model through historical training data. Optionally, the CRF layer is learned during the training process, for example, B-Label 1I-Label 1 is valid, and B-Label 1I-Label 2 is invalid.

S604, based on the target probability vector of each character, determining the accident information category to which each character belongs as a prediction classification result.

Optionally, in an embodiment, before determining the initial probability vector of the character based on the first character feature of the character, the method further includes:

acquiring postamble characteristics and postamble characteristics of each character in the sample text according to the reverse character sequence of the characters in the sample text; wherein, the postamble features are extracted features based on all characters of the character postamble; the rear character is characterized in that: a second character feature extracted based on a character subsequent to the character;

at this time, determining an initial probability vector of the character based on the first character feature of the character may include:

Based on the embodiment shown in fig. 2, as shown in fig. 7, an embodiment of the present invention further provides a training method for a pupil speckle reduction model, where the adjusting network parameters of the neural network model based on the difference between the prediction classification result and the calibration result of the target sample text in step S203 may include:

s701, calculating a loss function value of the neural network model based on the prediction classification result and the calibration result of the target sample text, wherein the loss function value is used as the difference between the prediction classification result and the calibration result of the target sample text;

optionally, in an embodiment, the number of the predicted classification results is multiple;

calculating a loss function value of the neural network model based on the prediction classification result and the calibration result of the target sample text, which may include:

step 1: calculating a result score for each predicted classification result;

in one implementation, the result score for each predicted classification result is calculated using the following formula:

wherein, P_iScoring the outcome of the ith prediction;

Step 2: adding the result scores of all the test classification results to obtain a prediction score of the neural network model;

and step 3: and calculating a loss function value of the neural network model based on the prediction score and the marking score corresponding to the calibration result of the target sample text.

In one implementation, the loss function value of the neural network model is calculated according to the following formula:

wherein LossFunction is the loss function value, P_RealPathMarking grade, P, corresponding to the calibration result of the target sample text_totalIs a prediction score.

S702, based on the loss function value, network parameters of the neural network model are adjusted.

Optionally, in an embodiment, after processing each text segment in the accident reporting text by using an information extraction operation corresponding to the accident information category to which the text segment belongs based on a preset correspondence between the accident information category and the text processing operation to obtain the accident information of the to-be-processed accident recorded in the text segment, the method may further include:

Illustratively, as shown in fig. 8, for an accident information report table provided by an embodiment of the present invention, the extracted accident information is filled into a corresponding position in the information report table. Filling the extracted accident time into a corresponding time text box, filling the extracted accident location into a corresponding location text box, filling the accident type into a corresponding type text box, and generating an accident information report table.

Corresponding to the method provided above, as shown in fig. 9, an embodiment of the present invention further provides an accident information extraction apparatus, including:

a text obtaining module 901, configured to obtain an accident pickup text; wherein, the accident reporting file is as follows: generating a text based on an accident reporting dialog for the accident to be processed;

the information classification module 902 is configured to perform accident information classification on the accident receiving text by using a pre-trained information classification model to obtain effective text segments each including effective information in the accident receiving text and an accident information category to which each effective text segment belongs; wherein, the information classification model is as follows: training based on the sample text and the corresponding labeling result, wherein the labeling result is used for indicating: each effective text segment in the sample text and the accident information category to which each effective text segment belongs;

the information extraction module 903 is configured to, for each valid text segment in the accident reporting text, based on a preset correspondence between the accident information category and the text processing operation, process the text segment by using an information extraction operation corresponding to the accident information category to which the text segment belongs, and obtain accident information of the to-be-processed accident recorded by the text segment.

the text input module is used for inputting the target sample text into the neural network model to be trained so that the neural network model can classify the accident information of the target sample text to obtain a prediction classification result; wherein predicting the classification result comprises: each effective text segment in the target sample text and the accident information category to which each effective text segment belongs;

a neural network model, comprising:

the probability adjusting module is used for adjusting the initial probability vector of each character according to the constraint condition; obtaining a target probability vector; the constraint condition is obtained by learning the neural network model through historical training data;

and the category prediction module is used for determining the accident information category to which each character belongs based on the target probability vector of each character as a prediction classification result.

Optionally, the loop processing module is specifically configured to, for each character in the characters, obtain an initial probability vector of the character in the following manner, where the method includes: acquiring a preceding character feature and a preceding character feature of each character according to a forward character sequence of each character in the sample text; wherein, the characteristic of the preamble is the characteristic extracted based on all the characters in the preamble of the character; the preceding characters are characterized in that: a first character feature extracted based on a character preceding the character; based on the preceding features, the preceding character features, and the character vector of the character, a first character feature of the character is calculated, and based on the first character feature of the character, an initial probability vector of the character is determined.

Optionally, the loop processing module is further configured to update the preceding feature based on the preceding feature and the character vector of the character after calculating the first character feature of the character based on the preceding feature, the preceding character feature and the character vector of the character.

Optionally, the loop processing module is further configured to, before determining an initial probability vector of the character based on the first character feature of the character, obtain a postamble feature and a postamble feature of the character according to a reverse character order of each character in the sample text; wherein, the postamble features are extracted features based on all characters of the character postamble; the rear character is characterized in that: a second character feature extracted based on a character subsequent to the character; calculating a second character feature of the character based on the postamble feature, the postamble feature and the character vector of the character;

and the loop processing module is specifically used for determining an initial probability vector of the character based on the first character characteristic of the character and the second character characteristic of the character.

Optionally, the parameter adjusting module is specifically configured to calculate a loss function value of the neural network model based on the prediction classification result and the calibration result of the target sample text, as a difference between the prediction classification result and the calibration result of the target sample text; network parameters of the neural network model are adjusted based on the loss function values.

Optionally, the number of the predicted classification results is multiple;

Optionally, the parameter adjusting module is specifically configured to:

wherein, P_iScoring the outcome of the ith prediction;

Optionally, the apparatus further comprises: and the report table generating module is used for executing the information extracting module to each effective text segment in the accident report receiving text, processing the text segment by adopting the information extracting operation corresponding to the accident information category to which the text segment belongs based on the preset corresponding relation between the accident information category and the text processing operation, and generating an accident information report table based on the obtained accident information after the accident information of the accident to be processed recorded in the text segment is obtained.

The embodiment of the present invention further provides an electronic device, as shown in fig. 10, which includes a processor 1001, a communication interface 1002, a memory 1003 and a communication bus 1004, wherein the processor 1001, the communication interface 1002 and the memory 1003 complete mutual communication through the communication bus 1004,

a memory 1003 for storing a computer program;

the processor 1001 is configured to implement the above-described procedure of the accident information extraction method when executing the program stored in the memory 1003.

The electronic device of the embodiment of the invention can acquire the accident report receiving text of the accident to be processed, and then classify the accident report receiving text by using the pre-trained information classification model to obtain the effective text sections each containing the effective information in the accident report receiving text and the accident information category to which each effective text section belongs, and process the text sections by using the information extraction operation corresponding to the accident information category to which the text section belongs according to the corresponding relation between the preset accident information category and the text processing operation aiming at each effective text section in the accident report receiving text, so that the accident information of the accident to be processed recorded in the text sections can be obtained.

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

In yet another embodiment of the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when executed by a processor, implements the steps of any of the above-mentioned accident information extraction methods.

In yet another embodiment, a computer program product containing instructions is provided, which when run on a computer causes the computer to perform any of the above-described incident information extraction methods.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the invention to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website site, computer, server, or data center to another website site, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus, device, and system embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference may be made to some descriptions of the method embodiments for relevant points.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. An accident information extraction method, characterized in that the method comprises:

2. The method of claim 1, wherein the information classification model is trained in the following manner:

acquiring a target sample text from the training sample set;

3. The method of claim 2, wherein each sample text corresponds to a labeling result: the accident information category of each character in the effective text segment of the sample text;

4. The method of claim 3, wherein the sequentially performing a cyclic processing on the character vectors of the characters based on the character sequence of the characters in the sample text to obtain an initial probability vector of each character comprises:

5. The method of claim 4, further comprising, after said computing the first character feature of the character based on the preceding feature, the preceding character feature, and the character vector of the character:

6. The method according to claim 4 or 5, wherein before determining the initial probability vector of the character based on the first character feature of the character, further comprising:

7. The method according to any one of claims 2-5, wherein the adjusting the network parameters of the neural network model based on the difference between the predicted classification result and the calibration result of the target sample text comprises:

8. The method of claim 7, wherein the predicted classification result is plural;

calculating a result score for each predicted classification result;

9. The method of claim 8, wherein said calculating a result score for each predictive classification result comprises:

wherein, P_iScoring the outcome of the ith prediction;

expressed in the ith prediction resultThe jump probability of the accident information category to which the j-1 character belongs and the accident information category to which the j-th character belongs.

10. The method of claim 8 or 9, wherein calculating the loss function value of the neural network model based on the annotation score corresponding to the calibration result of the target sample text comprises:

11. The method of claim 2, wherein each sample text corresponds to a labeling result: and marking the sample text by adopting a BIO marking mode to obtain a marking result.

12. The method of claim 1, wherein generating the incident pickup text based on the incident pickup dialog for the incident to be processed in a manner comprising:

13. The method according to claim 1, wherein after the text segment is processed by using an information extraction operation corresponding to the accident information category to which the text segment belongs based on a preset correspondence between an accident information category and a text processing operation for each valid text segment in the accident report text to obtain the accident information of the accident to be processed recorded in the text segment, the method further comprises:

14. An accident information extraction apparatus, characterized in that the apparatus comprises:

15. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1-13 when executing a program stored in the memory.

16. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of claims 1 to 13.