CN113111170A

CN113111170A - Method and device for extracting alarm receiving and processing text track ground information based on deep learning model

Info

Publication number: CN113111170A
Application number: CN202010307073.0A
Authority: CN
Inventors: 彭涛; 杜晶; 杨欣雨
Original assignee: Beijing Mingyi Technology Co ltd
Current assignee: Beijing Mingyi Technology Co ltd
Priority date: 2020-02-13
Filing date: 2020-04-17
Publication date: 2021-07-13

Abstract

The embodiment of the disclosure discloses an alarm receiving and processing text track ground information extraction method and device based on a deep learning model. One embodiment of the method comprises: acquiring a track ground information receiving and processing alarm text to be extracted; performing word segmentation on the track ground information alarm receiving and processing text to be extracted to obtain a corresponding word segmentation sequence; for each participle in the obtained participle sequence, performing the following trajectory-based information classification operation: inputting a word vector corresponding to the word segmentation into a track-ground information classification model to obtain a classification result of whether the word segmentation is track-ground information, wherein the track-ground information classification model is obtained by pre-training based on a deep learning model; and determining a track ground information set corresponding to the track ground information alarm receiving text to be extracted for each participle indicating track ground information according to the corresponding classification result in the participle sequence. The implementation mode realizes the automatic extraction of the track ground information in the alarm receiving and processing text.

Description

Method and device for extracting alarm receiving and processing text track ground information based on deep learning model

Technical Field

The embodiment of the disclosure relates to the technical field of computers, in particular to a method and a device for extracting alarm receiving and processing text track ground information based on a deep learning model.

Background

Currently, a 110-degree alarm receiving person in a public security organization enters an alarm receiving text when receiving an alarm. The alarm handling person can enter an alarm handling text after the alarm handling is finished. The alarm receiving and processing text comprises the alarm receiving text and the alarm processing text. In practice, a large number of alarm receiving texts may involve descriptions of information about the trajectory of the involved person. Wherein the track ground information may include a track ground identification and a corresponding track ground address. For example, "the track is located in the city of first province and second city" is a piece of track information, where "the track is located in" is identified as the track, and is used to indicate that the content behind the track is the address of the corresponding track, and "the city of first province and second city" is the address of the track. For case analysts, it is very important to extract the trajectory information in the alarm receiving and processing text. For example, a case analyst can predict where the person is likely to appear in the future based on the trajectory information of the same person in different alarm receiving texts. For another example, the case analyst may also perform data mining according to the track ground information of the criminals of the designated type in a large number of historical alarm receiving and processing texts, so as to find the track ground path of the criminals of the designated type and assist in detecting cases in the future.

At present, track-and-ground information in an alarm receiving and processing text is extracted manually, but the manual cost for extracting the track-and-ground information in the alarm receiving and processing text manually is too high and depends on personal experience.

Disclosure of Invention

The embodiment of the disclosure provides an alarm receiving and processing text track ground information extraction method and device based on a deep learning model.

In a first aspect, an embodiment of the present disclosure provides a method for extracting information of an alarm receiving and processing text trajectory based on a deep learning model, where the method includes: acquiring a track ground information receiving and processing alarm text to be extracted; performing word segmentation on the track ground information alarm receiving and processing text to be extracted to obtain a corresponding word segmentation sequence; for each participle in the obtained participle sequence, performing the following trajectory-based information classification operation: inputting a word vector corresponding to the word segmentation into a track-ground information classification model to obtain a classification result of whether the word segmentation is track-ground information, wherein the track-ground information classification model is obtained by pre-training based on a deep learning model; and determining a track ground information set corresponding to the track ground information alarm receiving text to be extracted for each participle indicating track ground information according to the corresponding classification result in the participle sequence.

In some embodiments, the track-based information classification model based on the deep learning model is obtained by training in advance through the following training steps: acquiring a training sample set, wherein the training sample comprises a word segmentation sequence obtained by segmenting a historical alarm receiving and processing text and a labeling information sequence corresponding to the word segmentation sequence, and the labeling information is used for indicating whether corresponding words in the word segmentation sequence are track ground information or not; determining each training sample of the corresponding participle sequence in the training sample set, which comprises track ground information participles, as a positive sample set, wherein the track ground information participles are the participles of which the corresponding labeled information in the participle sequence indicates that the participles are track ground information; determining a text feature vector of each positive sample according to each track information participle included in the participle sequence of each positive sample in the positive sample set; and training an initial deep learning model by taking the text feature vector of the positive sample in the positive sample set as an input and taking a classification result indicating track ground information as a corresponding expected output, so as to obtain the track ground information classification model.

In some embodiments, the training step further comprises: inputting preset negative sample feature vectors into the track ground information classification model to obtain corresponding actual output results; and adjusting the model parameters of the track-based information classification model according to the difference between the obtained actual output result and the classification result indicating the information which is not the track-based information.

In some embodiments, the determining the text feature vector of each positive sample according to the track information participle included in the participle sequence of each positive sample in the positive sample set includes: for each positive sample in the set of positive samples, performing the following vector generation and assignment operations: generating a text characteristic vector corresponding to the positive sample, wherein each component in the generated text characteristic vector corresponds to each word in a preset dictionary one by one; for each track ground information participle in the participle sequence of the positive sample, setting a component corresponding to the track ground information participle in the generated text characteristic vector as a word frequency-inverse text frequency index TF-IDF of the track ground information participle; and setting each unassigned component in the generated text feature vector as a preset numerical value, wherein the unassigned component is a component corresponding to a word of each track ground information word segmentation in the word segmentation sequence which belongs to the preset dictionary but does not belong to the positive sample.

In a second aspect, an embodiment of the present disclosure provides an alarm receiving and processing text trajectory information extraction apparatus based on a deep learning model, where the apparatus includes: the acquisition unit is configured to acquire track ground information alarm receiving and processing texts to be extracted; the word segmentation unit is configured to segment words of the track ground information receiving and processing alarm text to be extracted to obtain a corresponding word segmentation sequence; a classification unit configured to perform, for each participle in the obtained participle sequence, the following trajectory-based information classification operation: inputting a word vector corresponding to the word segmentation into a track-ground information classification model to obtain a classification result of whether the word segmentation is track-ground information, wherein the track-ground information classification model is obtained by pre-training based on a deep learning model; and the determining unit is configured to determine a track ground information set corresponding to the track ground information alarm receiving text to be extracted for each participle indicating track ground information according to the corresponding classification result in the participle sequence.

In a third aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device, on which one or more programs are stored, which, when executed by the one or more processors, cause the one or more processors to implement the method as described in any implementation manner of the first aspect.

In a fourth aspect, the present disclosure provides a computer-readable storage medium on which a computer program is stored, wherein the computer program, when executed by one or more processors, implements the method as described in any implementation manner of the first aspect.

In the prior art, track ground information in an alarm receiving and processing text is generally extracted manually, and the following problems may exist: (1) a large amount of alarm receiving and processing texts which have not been extracted from the track land information are left in history, and an alarm receiving and processing worker can input a large amount of new alarm receiving and processing texts every day along with the lapse of time, so that the data volume of the track land information to be extracted from the alarm receiving and processing texts is too large, and the labor cost and the time cost required by manual extraction are too high; (2) the alarm receiving and processing text is mostly described by natural language, the expression mode is seriously spoken and irregular, and the difficulty of manually extracting track information is high; (3) track ground identification and address information in the track ground information have more kinds of items, and track ground information extraction modes of different kinds of different items are different and depend on manual experience, namely, learning cost in the manual extraction process is higher.

According to the track ground information extraction method and device based on the deep learning model for the alarm receiving and processing text, the track ground information to be extracted is subjected to word segmentation to obtain a corresponding word segmentation sequence, and then for each word segmentation in the obtained word segmentation sequence, a word vector corresponding to the word segmentation is input into a track ground information classification model obtained through pre-training so as to extract track ground information in the track ground information to be extracted. Therefore, the track-ground information classification model is effectively utilized, the track-ground information of the butt-joint alarm handling text is automatically extracted, manual operation is not needed, the cost of extracting the track-ground information of the butt-joint alarm handling text is reduced, and the extraction speed of extracting the track-ground information of the butt-joint alarm handling text is improved.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a deep learning model-based method for extracting information of an alarm receiving text trajectory according to the present disclosure;

FIG. 3 is a flow chart of one embodiment of training steps according to the present disclosure;

FIG. 4 is a schematic structural diagram of an embodiment of an alarm receiving text trajectory information extraction apparatus based on a deep learning model according to the present disclosure;

FIG. 5 is a schematic block diagram of a computer system suitable for use in implementing an electronic device of an embodiment of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 illustrates an exemplary system architecture 100 to which an embodiment of a deep learning model-based alarm-receiving text trajectory ground information extraction method or a deep learning model-based alarm-receiving text trajectory ground information extraction apparatus of the present disclosure may be applied.

As shown in fig. 1, system architecture 100 may include terminal device 101, network 102, and server 103. Network 102 is the medium used to provide communication links between terminal devices 101 and server 103. Network 102 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

A user may use terminal device 101 to interact with server 103 over network 102 to receive or send messages and the like. Various communication client applications, such as an alarm receiving and processing record application, an alarm receiving and processing text track ground information extraction application, a web browser application, and the like, may be installed on the terminal device 101.

The terminal apparatus 101 may be hardware or software. When the terminal device 101 is hardware, it may be various electronic devices having a display screen and supporting text input, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the terminal apparatus 101 is software, it can be installed in the electronic apparatuses listed above. It may be implemented as multiple software or software modules (e.g., to provide an information extraction service for alarm-receiving text traces) or as a single software or software module. And is not particularly limited herein.

The server 103 may be a server providing various services, such as a background server providing trajectory information extraction for the alarm receiving text sent by the terminal device 101. The background server can analyze and process the received alarm receiving and processing text, and feed back the processing result (such as track ground information) to the terminal device.

In some cases, the method for extracting information of an alarm receiving text trajectory based on a deep learning model provided by the embodiment of the present disclosure may be executed by both the terminal device 101 and the server 103, for example, the step of "acquiring information of an alarm receiving text of a trajectory to be extracted" may be executed by the terminal device 101, and the rest steps may be executed by the server 103. The present disclosure is not limited thereto. Accordingly, the information extraction device for the alarm receiving and processing text track based on the deep learning model can also be respectively arranged in the terminal device 101 and the server 103.

In some cases, the method for extracting information of an alarm receiving and processing text trajectory based on a deep learning model provided by the embodiment of the present disclosure may be executed by the server 103, and accordingly, an apparatus for extracting information of an alarm receiving and processing text trajectory based on a deep learning model may also be disposed in the server 103, in this case, the system architecture 100 may also not include the terminal device 101.

In some cases, the method for extracting information of an alarm receiving and processing text trajectory based on a deep learning model provided by the embodiment of the present disclosure may be executed by the terminal device 101, and accordingly, an apparatus for extracting information of an alarm receiving and processing text trajectory based on a deep learning model may also be disposed in the terminal device 101, and in this case, the system architecture 100 may not include the server 103.

The server 103 may be hardware or software. When the server 103 is hardware, it may be implemented as a distributed server cluster composed of a plurality of servers, or may be implemented as a single server. When the server 103 is software, it may be implemented as a plurality of software or software modules (for example, an information extraction service for providing an alarm receiving text track), or may be implemented as a single software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.

With continued reference to fig. 2, a flow 200 of one embodiment of a deep learning model-based method for extracting information of an alarm receiving text trajectory according to the present disclosure is shown. The method for extracting the alarm receiving and processing text track ground information based on the deep learning model comprises the following steps:

step 201, obtaining a track ground information alarm receiving and processing text to be extracted.

In this embodiment, an executing main body (for example, a server shown in fig. 1) of the method for extracting information on a track of an alarm receiving text based on a deep learning model may obtain an alarm receiving text of information on a track to be extracted, which is locally stored, or the executing main body may also remotely obtain an alarm receiving text of information on a track to be extracted from other electronic devices (for example, terminal devices shown in fig. 1) connected to the executing main body through a network.

Here, the track-to-ground information alarm receiving and processing text to be extracted may be text data arranged by an alarm receiver according to the content of an alarm receiving telephone or text data arranged by an alarm processor according to an alarm processing process. The track ground information receiving and processing text to be extracted can also be an alarm text which is received from the terminal equipment and is input by a user in an alarm application installed on the terminal equipment or a webpage with an alarm function.

Step 202, performing word segmentation on the alarm receiving and processing text of the track ground information to be extracted to obtain a corresponding word segmentation sequence.

In this embodiment, the execution main body may adopt various implementation manners to perform word segmentation on the alert text of the information receiving and processing of the trajectory to be extracted to obtain a corresponding word segmentation sequence. It should be noted that how to cut words of text is the prior art of extensive research and application in this field, and will not be described herein. For example, a word segmentation method based on string matching, a word segmentation method based on understanding, or a word segmentation method based on statistics, etc. may be employed. For example, the word segmentation sequence "zhang san/found/track/floor/in/a/province/b/city/third/hotel" is obtained by performing word segmentation on the information alarm receiving text "zhang san found track floor in first province and second city" to be extracted.

And step 203, executing track information classification operation on each participle in the obtained participle sequence.

In this embodiment, the executing body may execute a trajectory-based information classification operation on each participle in the participle sequence obtained in step 202. Here, the track-based information classification operation is to input the word vector corresponding to the word segmentation into the track-based information classification model, and obtain a classification result of whether the word segmentation is track-based information.

Here, the trajectory-based information classification model is trained in advance based on the deep learning model.

In this embodiment, the execution subject may first determine a word vector corresponding to the word segmentation in various implementations.

In some optional implementations, the word vector corresponding to the segmented word may include N-dimensional components, where N is a positive integer, and each dimensional component in the N-dimensional components corresponds to each word in the preset dictionary one to one. In the process of determining the word vector corresponding to the word segmentation, a component corresponding to the word segmentation in each component of the word vector of the word segmentation may be set to a first preset value (e.g., 1); the other component of the word vector corresponding to the participle (i.e., the component corresponding to a word in the preset dictionary other than the participle) is set to a second preset numerical value (e.g., 0).

In some optional implementations, the word vector corresponding to the segmented word may include N-dimensional components, where N is a positive integer, and each dimensional component in the N-dimensional components corresponds to each word in the preset dictionary one to one. In the process of determining the word vector corresponding to the participle, the execution main body may also first calculate a word Frequency-Inverse text Frequency index (TF-IDF, Term Frequency-Inverse Document Frequency) of the participle in the trace to be extracted information receiving and processing alarm text, then set a component corresponding to the participle in the word vector corresponding to the participle as the calculated word Frequency-Inverse text Frequency index of the participle, and finally set the other component of the word vector corresponding to the participle (i.e., a component corresponding to a word different from the participle in the preset dictionary) as a third preset numerical value (e.g., 0).

Then, the execution body may input the word vector corresponding to the word segmentation into the trajectory-based information classification model, and obtain a classification result of whether the word segmentation is trajectory-based information. For example, the track ground information to be extracted and the alarm text "zhang san bi he ago finds the track ground in the hotel A, and today appears in the park B. The corresponding word segmentation sequence comprises three word segments, two word segments, four word segments. Referring to table 1, table 1 shows classification results obtained by inputting each participle in the participle sequence into the information classification model of the track.

TABLE 1

Word segmentation	Classification result
		Zhang San	Whether or not
Two are	Whether or not
		Sky	Whether or not
Front side	Whether or not
		Quilt	Whether or not
Discovery	Whether or not
		Track of	Is that
Ground	Is that
		In that	Is that
First of all	Is that
		Hotel	Is that
Today's appliances	Whether or not
		And also	Whether or not
Appear	Is that
		In that	Is that
Second step	Is that
		Park	Is that

And 204, determining a track ground information set corresponding to the track ground information alarm receiving text to be extracted according to the corresponding classification result in the segmentation sequence, wherein the classification result is used for indicating that the segmentation is track ground information.

Here, in step 203, a certain word in the word segmentation sequence is input into the track-based information classification model, a classification result indicating whether the word is track-based information is obtained, and if the obtained classification result indicating that the word is track-based information, the word is track-based information word segmentation. In step 204, the executing main body may determine, by using various implementation manners, a track-area information set corresponding to the track-area information alarm receiving text to be extracted according to each track-area information participle in the participle sequence.

In some optional implementations, the execution subject may determine each trace-based information participle in the participle sequence as trace-based information in the trace-based information set. The implementation mode is more suitable for the fact that each participle in the participle sequence obtained by word segmentation is information of a relatively complete track.

In some optional implementations, the execution main body may also merge information participles of each track directly adjacent in the participle sequence into information of one track, and use the obtained information of each track as information of a track in the information set of tracks. The implementation mode is more suitable for the information that each participle in the participle sequence obtained by word segmentation is relatively short and cannot form a complete track. Continuing with the above example of "zhang san/bi/day/front/quilt/find/track/floor/first/hotel/today/second/park" regarding the track to be extracted, the corresponding segmentation sequence is "zhang san/bi/day/front/quilt/find/track/floor/first/hotel/today/second/present/second/park", according to the classification results in table 1, the track information segmentation including the following can be obtained: "track", "ground", "at", "first", "hotel", "appear", "at", "second", "park". In order to form track information with practical significance, the information participles of each track directly adjacent to each other can be combined into track information according to the positions of the information participles of each track in the participle sequence, and then each track information in the track information set can be obtained. For example, the following set of trajectory-based information { "trajectory-based at first hotel", "present at second park" } is available here.

It should be noted that the alarm receiving and processing text of the track-to-be-extracted ground information may not include any track-to-ground information, and the track-to-ground information set corresponding to the alarm receiving and processing text of the track-to-be-extracted ground information may be empty. The track ground information alarm receiving text to be extracted may also include at least any track ground information, and at this time, the track ground information set corresponding to the track ground information alarm receiving text to be extracted may include at least one track ground information.

In some alternative implementations, the trajectory-based information classification model based on the deep learning model may be obtained by pre-training through a training step as shown in fig. 3. Referring to fig. 3, fig. 3 illustrates a flow 300 of one embodiment of training steps according to the present disclosure. The training step comprises the following steps:

here, the execution subject of the training step may be the same as that of the above-described deep learning model-based alarm receiving text trajectory-based information extraction method. In this way, the executing agent of the training step may store the model parameters of the trajectory ground information classification model in the local executing agent after the trajectory ground information classification model is obtained through training, and read the model parameters of the trajectory ground information classification model obtained through training in the process of executing the method for extracting the alarm receiving and processing text trajectory ground information based on the deep learning model.

Here, the execution subject of the training step may also be different from the execution subject of the above-described alarm receiving text trajectory information extraction method based on the deep learning model. In this way, the executing agent of the training step may send the model parameters of the trajectory-ground information classification model to the executing agent of the method for extracting the alarm receiving text trajectory-ground information based on the deep learning model after the trajectory-ground information classification model is obtained through training. In this way, the executing agent of the deep learning model-based alarm receiving text trajectory ground information extracting method may read the model parameters of the trajectory ground information classification model received from the executing agent of the training step in the process of executing the deep learning model-based alarm receiving text trajectory ground information extracting method.

Step 301, a training sample set is obtained.

Here, the performing subject of the training step may first obtain a set of training samples. Each training sample comprises a word segmentation sequence obtained by segmenting a historical alarm receiving and processing text and a labeling information sequence corresponding to the word segmentation sequence, wherein the labeling information is used for indicating whether corresponding words in the word segmentation sequence are track ground information or not.

As an example, the training sample may include a sequence of tokens "zhang san/bi/day/front/found/track/ground/at/a/hotel/today/again/present/at/b/park" and a sequence of annotation information "0/0/0/0/0/0/1/1/1/1/1/0/0/1/1/1/1," where "0" is used to indicate that its corresponding token is not track-ground information and "1" is used to indicate that its corresponding token is track-ground information.

In practice, a word segmentation method can be used for manually segmenting the historical alarm receiving and processing text to obtain a word segmentation sequence and labeling each word segmentation in the word segmentation sequence to obtain a corresponding labeled information sequence.

Step 302, determining each training sample of the corresponding word segmentation sequence in the training sample set including the information word segmentation of the trajectory as a positive sample set.

Here, the track-ground information participle is a participle of track-ground information indicated by corresponding label information in the participle sequence.

Step 303, determining the text feature vector of each positive sample according to the track information participles included in the participle sequence of each positive sample in the positive sample set.

In this embodiment, the executing agent of the training step may determine, for each positive sample in the positive sample set determined in step 302, a text feature vector of the positive sample according to each trace information participle included in the participle sequence of the positive sample.

In some alternative implementations, step 303 may proceed as follows: if the preset dictionary includes N words, where N is a positive integer, the text feature vector of the positive sample may include N-dimensional components, and each of the N-dimensional components corresponds to each of the words of the preset dictionary one by one. Determining the text feature vector for the positive sample may be performed as follows: for each track information participle in the participle sequence of the positive sample, setting a component corresponding to the track information participle in the text feature vector of the positive sample as a fourth preset numerical value (e.g., 1), and setting each unassigned component in the text feature vector of the positive sample as a fifth preset numerical value (e.g., 0), wherein the unassigned component is a component corresponding to a word belonging to a preset dictionary but not belonging to each track information participle in the participle sequence of the positive sample.

For ease of understanding, the following is exemplified: assuming that the preset dictionary includes 20 words, the positive sample includes a segmentation sequence "three times/two times/day/before/found/track/ground/at/first/hotel/today/again/present/second/park" and a tagging information sequence "0/0/0/0/0/0/1/1/1/1/1/0/0/1/1/1/1", wherein "0" is used to indicate that its corresponding segmentation is not track-based information, and "1" is used to indicate that its corresponding segmentation is track-based information. Here, for each participle of the participle sequence "three times a piece/two times a day/before/by/found/track/ground/in/a hotel/today/again/present/in/b/a park" of the positive sample, if the participle is a track-based information participle, a component corresponding to the track-based information participle in the 20-dimensional text feature vector of the positive sample may be set to 1. Specifically, whether the participle is the track information participle can be determined by using the corresponding tagged information sequence of the participle sequence. Thus, the "track", "ground", "at", "first", "hotel", "presence", "second" and "park" is known by the above-mentioned sequence of annotated information "zhang san/bi/day/before/found/track/ground/at/first/hotel/today/again/presence/at/second/park" as being track-by-track information participles. And the components corresponding to the above-mentioned information participles of each track in the preset dictionary are respectively 2 nd, 4 th, 5 th, 6 th, 7 th, 10 th, 11 th and 15 th dimensions, then the 2 nd, 4 th, 5 th, 6 th, 7 th, 10 th, 11 th and 15 th dimensions in the 20-dimensional text feature vector of the positive sample can be respectively set to 1. Then, each unassigned component in the text feature vector of the positive sample may be set to 0, that is, other components except for the 2 nd, 4 th, 5 th, 6 th, 7 th, 10 th, 11 th, and 15 th dimensions may be set to 0, so as to obtain the following text feature vector: (0,1,0,1,1,1,1,0,0,1,1,0,0,0,1,0,0,0,0,0).

In some alternative implementations, step 303 may also proceed as follows:

for each positive sample in the set of positive samples, the following vector generation and assignment operations are performed:

first, a text feature vector corresponding to the positive sample is generated. Here, each component in the generated text feature vector corresponds to each word in the preset dictionary one-to-one.

Secondly, for each track information participle in the participle sequence of the positive sample, setting a component corresponding to the track information participle in the generated text characteristic vector as a word frequency-inverse text frequency index of the track information participle.

And finally, setting each unassigned component in the generated text feature vector as a preset numerical value. Here, the unassigned component is a component corresponding to a word belonging to a preset dictionary but not to an information word of each track in the word segmentation sequence of the positive sample.

To facilitate understanding, the above example is continued, and unlike in the above example, in the text feature vector generated here, the 2 nd, 4 th, 5 th, 6 th, 7 th, 10 th, 11 th, 15 th-dimensional components corresponding to "track", "ground", "at", "first", "hotel", "appearance", "second", and "park" are not set to 1, but are set to the word frequency-inverse text frequency indexes of "track", "ground", "at", "first", "hotel", "appearance", "second", and "park" 0.21, 0.41,0.85,0.83,0.57, 0.34,0.54, 0.71, respectively. Then, each unassigned component in the text feature vector of the positive sample may be set to 0, that is, other components except for the 2 nd, 4 th, 5 th, 6 th, 7 th, 10 th, 11 th, and 15 th dimensions may be set to 0, so as to obtain the following text feature vector: (0,0.21,0,0.41,0.85,0.83,0.57,0,0,0.34,0.54,0,0,0,0.71,0,0,0,0,0).

And step 304, taking the text feature vector of the positive sample in the positive sample set as input, taking the classification result indicating that the text feature vector is track ground information as corresponding expected output, training an initial deep learning model, and obtaining a track ground information classification model.

Here, with the positive sample set, the executing agent of the training step may train the initial deep learning model with the text feature vector of the positive sample in the positive sample set as an input for indicating that the classification result is track-based information as a corresponding expected output, resulting in a track-based information classification model. Specifically, the following can be performed:

first, the model structure of the initial deep learning model may be determined.

Here, the initial deep learning model may include various deep learning models. For example, the initial deep learning model may include at least one of: convolutional neural networks, cyclic neural networks, long-short term memory networks, conditional random fields.

By way of example, if the initial deep learning model is determined to be a convolutional neural network, it can be determined which layers the convolutional neural network specifically includes, such as which convolutional layers, pooling layers, fully-connected layers, and precedence relationships between layers. If convolutional layers are included, the size of the convolutional kernel of the convolutional layer, the convolution step size, can be determined. If a pooling layer is included, a pooling method may be determined.

Second, initial values of model parameters included in the initial deep learning model may be determined.

For example, if the initial deep learning model is determined to be a convolutional neural network, here, convolutional kernel parameters of convolutional layers that may be included in the convolutional neural network may be initialized, connection parameters for fully-connected layers may be initialized, and so on.

Finally, a parameter adjustment operation may be performed on the positive samples in the positive sample set until a preset training end condition is satisfied, where the parameter adjustment operation may include: inputting the text feature vector of the positive sample into an initial deep learning model to obtain a corresponding actual output result, calculating the difference between the actual output result and a classification result indicating that the text feature vector is track ground information, and adjusting the model parameters of the initial deep learning model based on the obtained difference. Here, the training end condition may include, for example, at least one of: the number of times of executing parameter adjustment operation reaches the preset maximum training number, and the calculated difference is smaller than the preset difference threshold value.

Through the parameter adjustment operation, the model parameters of the initial deep learning model are optimized, and the initial deep learning model after the parameter optimization can be determined as a track ground information classification model. It should be noted that how to adjust and optimize the model parameters of the initial deep learning model based on the calculated differences is a prior art widely studied and applied in the field, and is not described herein again. For example, a gradient descent method may be employed.

In some optional implementations, the flow 300 may further include the following steps 305 and 306:

and 305, inputting the preset negative sample feature vector into the track ground information classification model to obtain a corresponding actual output result.

Here, the negative example feature vector refers to a feature vector for characterizing a negative example, and the negative example is a training example of information participles in which the corresponding participle sequence in the training sample set does not include a trajectory. Since the corresponding segmentation sequence of the negative examples does not include the information segmentation of the track, all the negative examples can be characterized by the preset negative example feature vector.

For another example, when the text feature vector of the positive sample adopts the first optional implementation manner described in step 303, that is, the fourth preset numerical value and the fifth preset numerical value are respectively adopted to represent the information segmentation of the track and the information segmentation of the non-track, the preset negative sample feature vector here may be a feature vector in which each dimensional component is the fifth preset numerical value. That is, for example, if the text feature vector of the positive sample has 20 dimensions and the fifth preset value is 0, the preset negative sample feature vector may be: (0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0).

For example, when the text feature vector of the positive sample adopts the second alternative implementation manner described in step 303, that is, the word frequency-inverse text frequency index and the preset value are respectively adopted to represent the information participle in the track and the information participle in the non-track, the preset negative sample feature vector here may be a feature vector in which each dimensional component is a preset value.

And step 306, adjusting model parameters of the track ground information classification model according to the difference between the obtained actual output result and the classification result indicating that the track ground information is not track ground information.

By using the training steps shown in the above-mentioned flow 300, the trajectory information classification model can be automatically generated, and the labor cost for generating the trajectory information classification model is reduced. The expression mode of people changes along with the time, the reaction also changes in the alarm receiving and processing text, and novel track ground information may appear along with the development of the society. At this time, a new training sample set can be obtained, training is performed by adopting a training step to obtain an updated track ground information classification model, so that the requirement for the change of the expression mode of the current alarm receiving and processing text and the requirement for extracting novel track ground information can be met.

According to the method provided by the embodiment of the disclosure, by using the track-ground information classification model, the track-ground information of the butt-joint alarm processing text is automatically extracted, manual operation is not needed, the cost for extracting the track-ground information of the butt-joint alarm processing text is reduced, and the extraction speed for extracting the track-ground information of the butt-joint alarm processing text is improved.

With further reference to fig. 4, as an implementation of the method shown in the above figures, the present disclosure provides an embodiment of an apparatus for extracting information of an alarm receiving text trajectory based on a deep learning model, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be applied to various electronic devices.

As shown in fig. 4, the deep learning model-based alarm receiving text trajectory ground information extraction apparatus 400 of the present embodiment includes: an acquisition unit 401, a word segmentation unit 402, a classification unit 403 and a determination unit 404. The acquiring unit 401 is configured to acquire a track ground information alarm receiving and processing text to be extracted; a word segmentation unit 402, configured to segment words of the alert receiving and processing text of the trajectory to be extracted to obtain a corresponding word segmentation sequence; a classification unit 403 configured to perform, for each participle in the obtained participle sequence, the following trajectory-based information classification operation: inputting a word vector corresponding to the word segmentation into a track-ground information classification model to obtain a classification result of whether the word segmentation is track-ground information, wherein the track-ground information classification model is obtained by pre-training based on a deep learning model; a determining unit 404, configured to determine, according to the corresponding classification result in the word segmentation sequence, a track-area information set corresponding to the track-area information alarm receiving text to be extracted for each segmented word indicating track-area information.

In this embodiment, specific processes of the obtaining unit 401, the word segmentation unit 402, the classification unit 403, and the determination unit 404 of the deep learning model-based alarm receiving text trajectory information extraction apparatus 400 and technical effects thereof may respectively refer to the related descriptions of step 201, step 202, step 203, and step 204 in the corresponding embodiment of fig. 2, and are not described herein again.

In some optional implementation manners of this embodiment, the track-based information classification model based on the deep learning model may be obtained by training in advance through the following training steps: acquiring a training sample set, wherein the training sample comprises a word segmentation sequence obtained by segmenting a historical alarm receiving and processing text and a labeling information sequence corresponding to the word segmentation sequence, and the labeling information is used for indicating whether corresponding words in the word segmentation sequence are track ground information or not; determining each training sample of the corresponding participle sequence in the training sample set, which comprises track ground information participles, as a positive sample set, wherein the track ground information participles are the participles of which the corresponding labeled information in the participle sequence indicates that the participles are track ground information; determining a text feature vector of each positive sample according to each track information participle included in the participle sequence of each positive sample in the positive sample set; and training an initial deep learning model by taking the text feature vector of the positive sample in the positive sample set as an input and taking a classification result indicating track ground information as a corresponding expected output, so as to obtain the track ground information classification model.

In some optional implementation manners of this embodiment, the training step may further include: inputting preset negative sample feature vectors into the track ground information classification model to obtain corresponding actual output results; and adjusting the model parameters of the track-based information classification model according to the difference between the obtained actual output result and the classification result indicating the information which is not the track-based information.

In some optional implementation manners of this embodiment, the determining the text feature vector of the positive sample according to the information participles of the tracks included in the participle sequence of each positive sample in the positive sample set may include: for each positive sample in the set of positive samples, performing the following vector generation and assignment operations: generating a text characteristic vector corresponding to the positive sample, wherein each component in the generated text characteristic vector corresponds to each word in a preset dictionary one by one; for each track ground information participle in the participle sequence of the positive sample, setting a component corresponding to the track ground information participle in the generated text characteristic vector as a word frequency-inverse text frequency index TF-IDF of the track ground information participle; and setting each unassigned component in the generated text feature vector as a preset numerical value, wherein the unassigned component is a component corresponding to a word of each track ground information word segmentation in the word segmentation sequence which belongs to the preset dictionary but does not belong to the positive sample.

It should be noted that, for details and technical effects of implementation of each unit in the alarm receiving and processing text trajectory information extraction device based on the deep learning model provided in the embodiment of the present disclosure, reference may be made to descriptions of other embodiments in the present disclosure, and details are not described here again.

Referring now to FIG. 5, shown is a block diagram of a computer system 500 suitable for use in implementing the electronic devices of embodiments of the present disclosure. The electronic device shown in fig. 5 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 5, the computer system 500 includes a Central Processing Unit (CPU)501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. In the RAM 503, various programs and data necessary for the operation of the system 500 are also stored. The CPU 501, ROM 502, and RAM 503 are connected to each other via a bus 504. An Input/Output (I/O) interface 505 is also connected to bus 504.

The following components are connected to the I/O interface 505: an input section 506 including a touch screen, a tablet, a keyboard, a mouse, or the like; an output section 507 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like; a storage portion 508 including a hard disk and the like; and a communication section 509 including a Network interface card such as a LAN (Local Area Network) card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509. The above-described functions defined in the method of the present disclosure are performed when the computer program is executed by a Central Processing Unit (CPU) 501. It should be noted that the computer readable medium in the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer-readable signal medium may include a propagated data signal with computer-readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, Python, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, a word segmentation unit, a classification unit, and a determination unit. The names of the units do not constitute a limitation to the unit itself in some cases, and for example, the acquiring unit may also be described as a unit for acquiring the text of the information processing alarm for the trajectory to be extracted.

As another aspect, the present disclosure also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring a track ground information receiving and processing alarm text to be extracted; performing word segmentation on the track ground information alarm receiving and processing text to be extracted to obtain a corresponding word segmentation sequence; for each participle in the obtained participle sequence, performing the following trajectory-based information classification operation: inputting a word vector corresponding to the word segmentation into a track-ground information classification model to obtain a classification result of whether the word segmentation is track-ground information, wherein the track-ground information classification model is obtained by pre-training based on a deep learning model; and determining a track ground information set corresponding to the track ground information alarm receiving text to be extracted for each participle indicating track ground information according to the corresponding classification result in the participle sequence.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept as defined above. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A deep learning model-based alarm receiving and processing text trajectory ground information extraction method comprises the following steps:

acquiring a track ground information receiving and processing alarm text to be extracted;

performing word segmentation on the track ground information alarm receiving and processing text to be extracted to obtain a corresponding word segmentation sequence;

for each participle in the obtained participle sequence, performing the following trajectory-based information classification operation: inputting a word vector corresponding to the word segmentation into a track-ground information classification model to obtain a classification result of whether the word segmentation is track-ground information, wherein the track-ground information classification model is obtained by pre-training based on a deep learning model;

and determining a track ground information set corresponding to the track ground information alarm receiving text to be extracted for each participle used for indicating track ground information according to the corresponding classification result in the participle sequence.

2. The method according to claim 1, wherein the deep learning model-based trajectory-ground information classification model is obtained by training in advance through the following training steps:

acquiring a training sample set, wherein the training sample comprises a word segmentation sequence obtained by segmenting a historical alarm receiving and processing text and a labeling information sequence corresponding to the word segmentation sequence, and the labeling information is used for indicating whether corresponding words in the word segmentation sequence are track ground information or not;

determining each training sample of the corresponding participle sequence in the training sample set, which comprises track ground information participles, as a positive sample set, wherein the track ground information participles are participles of which the corresponding labeled information in the participle sequence indicates that the participles are track ground information;

determining a text feature vector of each positive sample according to each track information participle included in the participle sequence of each positive sample in the positive sample set;

and training an initial deep learning model by taking the text feature vector of the positive sample in the positive sample set as an input and taking a classification result indicating that the classification result is track ground information as a corresponding expected output to obtain the track ground information classification model.

3. The method of claim 2, wherein the training step further comprises:

inputting preset negative sample feature vectors into the track ground information classification model to obtain corresponding actual output results;

and adjusting the model parameters of the track ground information classification model according to the difference between the obtained actual output result and the classification result indicating the track ground information.

4. The method according to claim 2 or 3, wherein the determining the text feature vector of each positive sample according to the trace information participle included in the participle sequence of the positive sample in the positive sample set comprises:

for each positive sample in the set of positive samples, performing the following vector generation and assignment operations: generating a text characteristic vector corresponding to the positive sample, wherein each component in the generated text characteristic vector corresponds to each word in a preset dictionary one by one; for each track ground information participle in the participle sequence of the positive sample, setting a component corresponding to the track ground information participle in the generated text characteristic vector as a word frequency-inverse text frequency index TF-IDF of the track ground information participle; and setting each unassigned component in the generated text feature vector as a preset numerical value, wherein the unassigned component is a component corresponding to a word of each track ground information word segmentation in the word segmentation sequence which belongs to the preset dictionary but does not belong to the positive sample.

5. An alarm receiving and processing text trajectory ground information extraction device based on a deep learning model comprises:

the acquisition unit is configured to acquire track ground information alarm receiving and processing texts to be extracted;

the word cutting unit is configured to cut words of the track ground information receiving and processing alarm text to be extracted to obtain a corresponding word segmentation sequence;

a classification unit configured to perform, for each participle in the obtained participle sequence, the following trajectory-based information classification operation: inputting a word vector corresponding to the word segmentation into a track-ground information classification model to obtain a classification result of whether the word segmentation is track-ground information, wherein the track-ground information classification model is obtained by pre-training based on a deep learning model;

and the determining unit is configured to determine a track ground information set corresponding to the track ground information alarm receiving text to be extracted for each participle indicating track ground information according to the corresponding classification result in the participle sequence.

6. The apparatus of claim 5, wherein the deep learning model-based trajectory-ground information classification model is obtained by training in advance through the following training steps:

7. The apparatus of claim 6, wherein the training step further comprises:

8. The apparatus according to claim 6 or 7, wherein the determining the text feature vector of each positive sample according to the trace information participle included in the participle sequence of the positive sample in the positive sample set includes:

9. An electronic device, comprising:

one or more processors;

storage means for storing one or more programs;

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-4.

10. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1-4.