CN112541075B - Standard case sending time extraction method and system for alert text - Google Patents
Standard case sending time extraction method and system for alert text Download PDFInfo
- Publication number
- CN112541075B CN112541075B CN202011195667.3A CN202011195667A CN112541075B CN 112541075 B CN112541075 B CN 112541075B CN 202011195667 A CN202011195667 A CN 202011195667A CN 112541075 B CN112541075 B CN 112541075B
- Authority
- CN
- China
- Prior art keywords
- time
- text
- elements
- clause
- day
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000605 extraction Methods 0.000 title claims abstract description 23
- 238000012549 training Methods 0.000 claims abstract description 39
- 230000014509 gene expression Effects 0.000 claims abstract description 32
- 238000000034 method Methods 0.000 claims abstract description 29
- 238000012545 processing Methods 0.000 claims abstract description 17
- 239000000284 extract Substances 0.000 claims abstract description 7
- 238000004364 calculation method Methods 0.000 claims description 5
- 230000001502 supplementing effect Effects 0.000 claims description 4
- 238000004590 computer program Methods 0.000 claims description 3
- 238000010606 normalization Methods 0.000 claims description 3
- 238000004458 analytical method Methods 0.000 abstract description 3
- 238000012795 verification Methods 0.000 abstract 1
- 230000000694 effects Effects 0.000 description 5
- 238000011161 development Methods 0.000 description 2
- 238000002372 labelling Methods 0.000 description 2
- 241001622623 Coeliadinae Species 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/10—Services
- G06Q50/26—Government or public services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Tourism & Hospitality (AREA)
- Data Mining & Analysis (AREA)
- Development Economics (AREA)
- Educational Administration (AREA)
- Databases & Information Systems (AREA)
- Economics (AREA)
- Human Resources & Organizations (AREA)
- Marketing (AREA)
- Primary Health Care (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Document Processing Apparatus (AREA)
Abstract
The invention discloses a standard case sending time extraction method and system of police text, and belongs to the technical field of police text extraction. The method comprises the following steps: sequentially extracting time elements in the warning text in a named entity identification mode; dividing the alert text into a plurality of text clauses, and constructing key value pairs of the text clauses and the time elements; establishing and training a case time identification model, and identifying the expression content in the text clause through the case time identification model to determine the case time; carrying out standardization processing on the determined issuing time; and merging the standardized issuing time, and further marking the merged issuing time. The invention adds the case issuing time identification model based on the named entity identification time element, accurately identifies and extracts the case issuing time information, and provides service convenience and support for the rapid and accurate analysis and verification of the police.
Description
Technical Field
The invention belongs to the technical field of police condition text extraction, and particularly relates to a method and a system for extracting standard case sending time of police condition text.
Background
The time element extraction technology in the text is mature, and the method such as a named entity recognition task, a regular expression, a sequence labeling model and the like can achieve good effects. The regular expression matches the text based on a fixed time expression template; the sequence labeling model relies on text data labeled in advance, and the machine learns the characteristics of the time elements in the text sequence through manual labels.
However, in the police system, how to distinguish the attribute of each time element in the police text and convert the attribute into a standard time format to make reasoning about a plurality of time relations is not related to the current technology. The time elements in the alert text are divided into alarm time, case sending time, other background time and the like. Wherein the concurrency time is a time period or a time point under a specific scene. At present, the existing model in the prior art is difficult to accurately extract the case time in the police text, so that the service pressure of the police is greatly increased.
Disclosure of Invention
The invention provides a method and a system for extracting standard case issuing time of alert texts, which are used for solving the problems in the prior art.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a standard case sending time extraction method of alert text comprises the following steps:
step 1: sequentially extracting time elements in the warning text in a named entity identification mode;
step 2: dividing the alert text into a plurality of text clauses, and constructing key value pairs of the text clauses and the time elements;
step 3: establishing and training a case time identification model, and identifying the expression content in the text clause through the case time identification model to determine the case time;
step 4: carrying out standardization processing on the determined issuing time;
step 5: and merging the standardized issuing time, and further marking the merged issuing time.
In a further embodiment, the step 1 adopts a regular expression to extract the time element, and the specific process is as follows:
step 11: firstly, removing the content in brackets in the warning text, and removing time element interference information in the bracket content;
step 12: then extracting time elements in the text by using a regular expression, wherein the regular expression is as follows: ([ 0-9] {4} years)? ([ 0-9] {1,2} month)? ([ 0-9] {1,2} day)? (before yesterday? [ \\ u4E00- \\u9FA5]? (night |early|morning|afternoon|evening)? [ \\ u4E00- \\u9FA5]? ([ 0-9] {0,2} [ time |Point ])? ([ 0-9] {0,2} points);
Wherein:
([ 0-9] {4} years), representing four digits plus "year" for matching year time;
([ 0-9] {1,2} month), representing one or two digits plus "month" for matching month time;
([ 0-9] {1,2} day), representing one or two digits plus "day" for matching time of day;
(Jinyesterday) front) [ \\\u 4E00- \\u9FA5], to match the relative date descriptions of "today", "yesterday", and "previous day";
(night|early|morning|afternoon|evening) [ (u 4E00-_9FA5], period descriptions for matching "evening", "morning", "afternoon";
([ 0-9] {1,2} [ time|point ]), representing one or two digits plus "time" or "point" for matching a specific hour;
([ 0-9] {1,2} minutes), representing one or two digits plus "minutes" are used to match a particular minute.
In a further embodiment, the step 2 is further:
firstly, sequentially arranging the extracted time elements according to the sequence of the occurrence of the alarm text, and determining the first time as alarm time;
then, the warning text is segmented into a plurality of text clauses through punctuation regular matching;
finally, determining a text clause where a time element except the alarm time is located; if the text clause contains a time element and the left and right clauses of the text clause do not contain the time element, combining the left and right text clauses which do not contain the time element with the clause containing the time element to form a new text clause; and constructing key value pairs of which the time elements are in one-to-one correspondence with the text clauses.
In a further embodiment, the generating time identification model in the step 3 includes a pre-training model and a discrimination model;
the training data in the database is derived from historical warning condition data of manual marking of the case sending time, and the case sending time in the warning condition text is determined by comparing text clauses containing time elements with the training data; and automatically marking the text clause data after discrimination and supplementing the text clause data into a database;
the judging model comprises an input layer, a hidden layer and an output layer; the input layer is a text clause containing time elements for segmenting the warning text, and the node number is the number of the text clauses; the hidden layer is the data newly added into the database in the pre-training process and the original data in the database; the output layer is used for determining whether the time element in the text clause is the case time or not through comparison, and the number of nodes of the output layer is equal to the number of the text clauses to be judged; aiming at the situation that the judging process has data extension exceeding the training database, the input text clauses are manually processed, the processed data are fed into the database, and the data of the hidden layer are gradually increased along with the increase of the training process;
The judging model carries out error calculation on the judging result:
wherein X is ij For a text clause sample containing time elements, P (X ij ) Q (X) is the probability that the time element in the text clause is the occurrence time ij ) Probability of non-concurrence time for time element in text clause, and P (X ij )+Q(X ij ) =1, m is the number of nodes of the hidden layer, N is the number of text clause samples containing time elements; the smaller the H (P, Q) value, the smaller the error representing the discrimination result.
In a further embodiment, the step 4 is further:
step 41: directly determining a time element of 'year, month, day and time' through a regular expression of '0-9 ] {4} year', '0-9 ] {1,2} month', '0-9 ] {1,2} day', '0-9 ] {0,2} [ time|point ]' and executing the next step;
step 42: if "night", "afternoon", "evening" appears in the time element text, and the number of hours through "[0-9] {0,2} [ hour |Point ]") is less than 12, the number of hours is increased by 12;
step 43: if the 'day' element in the time element is missing and the time element contains 'Jing', 'yesterday' and 'before', the corresponding 'day' element is obtained by reasoning 0 day, 1 day and 2 days forward according to the alarm time;
step 44: if a single element in the time element 'year, month, day and hour' is missing, filling the corresponding element in the previous time element;
Step 45: the time elements are normalized to form a standard time of issuance in a 10-digit format of "yyyymmddhh".
In a further embodiment, the step 5 is further:
step 51: judging whether two adjacent time elements appear in the same text clause, when the two adjacent time elements appear in the same text clause and the former time is earlier than the latter time, combining the standard issuing time corresponding to the two time elements to form an issuing time period, otherwise, executing the next step;
step 52: calculating the small time difference of two adjacent time elements, when the two adjacent time elements differ by less than 24 hours and the former time is earlier than the latter time, combining the standard issuing time corresponding to the two time elements to form an issuing time period, otherwise, executing the next step;
step 53: searching keywords in a text clause, when keywords of start and start exist in the text clause corresponding to the previous time element in two adjacent time elements, keywords of end and end exist in the text clause corresponding to the next time element, and when the previous time is earlier than the next time, merging the standard case sending time corresponding to the two time elements to form a case sending time period, otherwise, executing the next step;
Step 54: positioning standard issuing time corresponding to the remaining time elements as issuing time points;
step 55: and marking the issuing time period and the issuing time point according to the time sequence.
A standard case time extraction system for alert text, comprising:
the first module is used for sequentially extracting time elements in the warning text in a named entity identification mode;
the second module is used for segmenting the warning text into a plurality of text clauses and constructing key value pairs of the text clauses and the time elements;
a third module for establishing and training a time identification model, and identifying the expression content in the text clause through the time identification model to determine the time;
a fourth module for performing normalization processing on the determined issue time;
and a fifth module for merging the standardized issuing time and further marking the merged issuing time.
In a further embodiment, the first module extracts the time element by using a regular expression, firstly removing the content in brackets in the alert text, removing the time element interference information in the bracket content, and then extracting the time element in the text by using the regular expression, where the regular expression is:
([ 0-9] {4} years)? ([ 0-9] {1,2} month)? ([ 0-9] {1,2} day)? (before yesterday? [ \\ u4E00- \\u9FA5]? (night |early|morning|afternoon|evening)? [ \\ u4E00- \\u9FA5]? ([ 0-9] {0,2} [ time |Point ])? ([ 0-9] {0,2} points);
wherein:
([ 0-9] {4} years), representing four digits plus "year" for matching year time;
([ 0-9] {1,2} month), representing one or two digits plus "month" for matching month time;
([ 0-9] {1,2} day), representing one or two digits plus "day" for matching time of day;
(Jinyesterday) front) [ \\\u 4E00- \\u9FA5], to match the relative date descriptions of "today", "yesterday", and "previous day";
(night|early|morning|afternoon|evening) [ (u 4E00-_9FA5], period descriptions for matching "evening", "morning", "afternoon";
([ 0-9] {1,2} [ time|point ]), representing one or two digits plus "time" or "point" for matching a specific hour;
([ 0-9] {1,2} minutes), representing one or two digits plus "minutes" for matching a particular minute;
the second module firstly sequentially arranges the extracted time elements according to the sequence of the occurrence in the warning text, and determines the first time as the warning time; then, the warning text is segmented into a plurality of text clauses through punctuation regular matching; finally, determining a text clause where a time element except the alarm time is located; if the text clause contains a time element and the left and right clauses of the text clause do not contain the time element, combining the left and right text clauses which do not contain the time element with the clause containing the time element to form a new text clause; and constructing key value pairs of which the time elements are in one-to-one correspondence with the text clauses.
The third module establishes and trains a time identification model, wherein the time identification model comprises a pre-training model and a judging model;
the training data in the database is derived from historical warning condition data of manual marking of the case sending time, and the case sending time in the warning condition text is determined by comparing text clauses containing time elements with the training data; and automatically marking the text clause data after discrimination and supplementing the text clause data into a database;
the judging model comprises an input layer, a hidden layer and an output layer; the input layer is a text clause containing time elements for segmenting the warning text, and the node number is the number of the text clauses; the hidden layer is the data newly added into the database in the pre-training process and the original data in the database; the output layer is used for determining whether the time element in the text clause is the case time or not through comparison, and the number of nodes of the output layer is equal to the number of the text clauses to be judged; aiming at the situation that the judging process has data extension exceeding the training database, the input text clauses are manually processed, the processed data are fed into the database, and the data of the hidden layer are gradually increased along with the increase of the training process;
The judging model carries out error calculation on the judging result:
wherein X is ij For a text clause sample containing time elements, P (X ij ) Q (X) is the probability that the time element in the text clause is the occurrence time ij ) Probability of non-concurrence time for time element in text clause, and P (X ij )+Q(X ij ) =1, m is the number of nodes of the hidden layer, N is the number of text clause samples containing time elements; the smaller the H (P, Q) value, the smaller the error representing the discrimination result.
In a further embodiment, the fourth module first determines the time element "year, month, day, time" directly from the regular expressions "[0-9] {4} year", "[0-9] {1,2} month", "[0-9] {1,2} day", "[0-9] {0,2} [ time |point ]"; "night", "afternoon", "evening" occurs in the time element text, and the number of hours through "[0-9] {0,2} [ hour |point ]" is less than 12, the number of hours is added to 12; when the 'day' element in the time element is missing and the time element contains 'Jing', 'yesterday' and 'before', the time element is inferred to be 0 day, 1 day and 2 days forward according to the alarm time, and the corresponding 'day' element is obtained; when a single element in the time element 'year, month, day and hour' is missing, filling corresponding elements in the previous time element; finally, the time elements are standardized to form the standard issuing time of the 10-bit digital format of yyyymmddhh;
The fifth module firstly judges whether two adjacent time elements appear in the same text clause, and when the two adjacent time elements appear in the same text clause and the former time is earlier than the latter time, the standard case sending time corresponding to the two time elements is combined to form a case sending time period; calculating the small time difference of two adjacent time elements, and when the two adjacent time elements differ by less than 24 hours and the former time is earlier than the latter time, combining the standard issuing time corresponding to the two time elements to form an issuing time period; searching keywords in a text clause, and when keywords of 'start', 'start' exist in the text clause corresponding to the previous time element in two adjacent time elements, keywords of 'end', 'end' exist in the text clause corresponding to the next time element, and the previous time is earlier than the next time, merging the standard case sending time corresponding to the two time elements to form a case sending time period; positioning standard issuing time corresponding to the remaining time elements as issuing time points; and finally, marking the issuing time period and the issuing time point according to the time sequence.
A computer processing system comprising a storage module in which is stored a computer program for a standard case time extraction method for alert text in any of the embodiments described above.
The beneficial effects are that: firstly, a case issuing time identification model is added on the basis of named entity identification time elements, case issuing time information is accurately identified and extracted, and service convenience is provided for a police to rapidly analyze and check the police;
secondly, segmenting the warning text into a plurality of text clauses containing time elements, constructing key value pairs of the text clauses and the time elements, and judging whether the time elements in the text clauses are case-issuing time or not by carrying out semantic recognition on the text clauses, so that the situation that the case-issuing time is difficult to identify and extract due to the fact that the content of the warning text is complex is reduced;
and finally, carrying out merging processing on the case issuing time and marking the case issuing time point and the case issuing time period, thereby providing service support for the rapid and accurate analysis of the police condition by the police.
Drawings
FIG. 1 is a flow chart of the standard case issue time extraction of alert text of the present invention.
Fig. 2 is a schematic structural diagram of the discrimination model of the present invention.
Detailed Description
The technical scheme of the invention will be clearly and completely described below with reference to the accompanying drawings and examples. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without any inventive effort, are intended to be within the scope of the invention.
The research shows that the time elements in the warning text are divided into warning time, case issuing time, other background time and the like, and the conventional public security warning system is difficult to distinguish the time element attributes in the warning text and also difficult to realize reasoning of a plurality of time relations so as to accurately identify and extract the case issuing time, and the work generally requires the police to perform manual identification marking, so that the service pressure of the police is greatly increased.
Example 1: as shown in fig. 1, in order to solve the problems existing in the prior art, embodiment 1 of the present invention provides a method for extracting a standard case development time of a warning text, including the following steps:
step 1: sequentially extracting time elements in the warning text in a named entity identification mode;
step 2: dividing the alert text into a plurality of text clauses, and constructing key value pairs of the text clauses and the time elements;
Step 3: establishing and training a case time identification model, and identifying the expression content in the text clause through the case time identification model to determine the case time;
step 4: carrying out standardization processing on the determined issuing time;
step 5: and merging the standardized issuing time, and further marking the merged issuing time.
The police text has complex content, a plurality of time elements and different attributes. For example, the time elements in the alert text are mainly divided into an alarm time, a case issue time, other background time, and the like. These time elements of different properties greatly increase the difficulty of extracting the time of occurrence. For a more detailed description of the embodiments of the present invention, the present application provides a section of simple alert text, "40 minutes at 10 days 27 and 07 in 2020, which is called: 7 months in 2020 and 21 days 7 days later, the bank card of alarming person three (household place: xxx, ID card number: xxx, date of birth: xxx) is stolen and swiped 1, and 100 yuan is lost; the theft of 2 pens from 7 months in 2020 to 6 hours 49 minutes to 7 months to 25 days 8 hours 10 minutes, and the total loss is 200 yuan. The person goes to the bank for loss reporting at 9 am on 7 months and 26 days in 2020. In the text, "40 minutes at 27 and 27 in 2020" is the alarm time, "7 minutes at 21 and 21 in 2020", "49 minutes at 25 and 6 in 2020" to 10 minutes at 25 and 8 in 7 and 25 "are the case time, and the birth date of the alarm person is other background time. The time elements have complex attributes, and the recognition difficulty of the issuing time is greatly increased. In addition, part of the time elements in the alert text are in a non-standard time format, such as 7-point late, which also increases the difficulty of identifying the time elements.
Therefore, in order to accurately determine the attribute of the time element, first, the useful time element in the alert text is accurately extracted. Further, step 1 adopts a regular expression to extract time elements, and the specific process is as follows:
firstly, removing the content in brackets in the warning text, and removing time element interference information in the bracket content. For example, the date of birth of the alarm person in the alarm text usually includes the time of birth of the alarm person, which can interfere with the extraction of the text time elements, and these interfering time elements should be eliminated first;
then, extracting time elements in the text by using a regular expression, wherein the regular expression is as follows:
([ 0-9] {4} years)? ([ 0-9] {1,2} month)? ([ 0-9] {1,2} day)? (before yesterday? [ \\ u4E00- \\u9FA5]? (night |early|morning|afternoon|evening)? [ \\ u4E00- \\u9FA5]? ([ 0-9] {0,2} [ time |Point ])? ([ 0-9] {0,2} points);
wherein:
([ 0-9] {4} years), representing four digits plus "year" for matching year time;
([ 0-9] {1,2} month), representing one or two digits plus "month" for matching month time;
([ 0-9] {1,2} day), representing one or two digits plus "day" for matching time of day;
(Jinyesterday) front) [ \\\u 4E00- \\u9FA5], to match the relative date descriptions of "today", "yesterday", and "previous day"; the wildcards of single Chinese characters can increase the richness of word matching, and are close to the description of alarming people in daily life;
(night|early|morning|afternoon|evening) [ (u 4E00-_9FA5], period descriptions for matching "evening", "morning", "afternoon"; the wildcard character of a single Chinese character can increase the richness of word matching, effectively solve the problem of nonstandard time elements in police text caused by spoken language, avoid directly extracting 7 late points as 7 late points, and ensure the accuracy of time element extraction;
([ 0-9] {1,2} [ time|point ]), representing one or two digits plus "time" or "point" for matching a specific hour;
([ 0-9] {1,2} minutes), representing one or two digits plus "minutes" for matching a particular minute;
the warning text is processed by the regular expression, and the time elements of 40 minutes at 27 days in 07 months in 2020, 7 minutes at 21 days in 2020, 49 minutes at 6 days in 7 months in 2020, 10 minutes at 8 days in 7 months, and 9 am in 26 days in 2020 are extracted sequentially.
Because the police text content is complex, the time elements are more, and if the judgment is directly carried out, the judgment of the case occurrence time is difficult. Therefore, the warning text is segmented into a plurality of text clauses, and semantic recognition is carried out on each text clause to judge whether the time element in the text clause is the case time or not. Thus, in a further embodiment, step 2 is specifically performed as follows:
firstly, sequentially arranging the extracted time elements according to the sequence of the occurrence of the alarm text, and determining the first time as alarm time; the alarm time is "40 minutes of 10 days of 27 months of the 2020 month of the alert text;
then, the warning text is segmented into a plurality of text clauses through punctuation regular matching; at this time, the situation that no time element exists in part of text clauses occurs, and the content of the text clauses without the time element is the content necessary for analyzing whether the time element is the case time, so that the text clauses without the time element cannot be directly removed, the text clauses without the time element are required to be combined into the text clauses with the time element, and the context content of the text clauses with the time element is perfected, so that the case time can be conveniently and accurately judged;
Finally, determining a text clause where a time element except the alarm time is located; if the text clause contains a time element and the left and right clauses of the text clause do not contain the time element, combining the left and right text clauses which do not contain the time element with the clause containing the time element to form a new text clause; and constructing key value pairs corresponding to the time elements and the text clauses one by one, wherein the key value pairs of the time elements and the text clauses constructed by the alert text are as follows:
if the time element is to be judged whether to be the time of the occurrence after the time element is extracted, accurate judgment can be carried out according to the semantics of the text clause where the time element is positioned. In a further embodiment, the recognition model of the time of occurrence is established and trained to recognize the expression content of the text clause, thereby determining whether the time element is the time of occurrence. The pattern generation time identification model comprises a pre-training model and a judging model.
Firstly, a database is established through a pre-training model, and training data in the database is derived from historical warning condition data of artificial marking case development time. Then comparing the text clause containing the time element in the warning text with the training data to determine the case sending time in the warning text; and the text clause data after discrimination is automatically marked and then is supplemented into the database, so that the content of the database is further enriched, and the occurrence time can be rapidly discriminated in the actual discrimination process.
Referring to fig. 2, the discriminant model includes an input layer, a hidden layer, and an output layer; the input layer is a text clause containing time elements for segmenting the warning text, and the node number is the number of the text clauses; the hidden layer is the data newly added into the database in the pre-training process and the original data in the database; the output layer determines whether the time element in the text clause is the case time or not through comparison, and the number of nodes of the output layer is equal to the number of the text clauses to be judged. And performing similarity comparison and discrimination on the text clause containing the time element input by the input layer and the comparison data of the hidden layer, and finally outputting a discrimination result by the output layer. Aiming at the situation that the judging process exceeds the data extension of the training database, the input text clause can be manually processed, and the processed data is fed into the database, so that the data of the hidden layer is gradually increased along with the increase of the training process. Therefore, as the number of text phrases processed by the discrimination model increases, the discrimination difficulty of the discrimination model becomes lower.
Error measurement and calculation are carried out on the discrimination results by adopting a cross entropy loss function for the discrimination structure of the discrimination model so as to increase the accuracy of the discrimination results:
Wherein X is ij A text clause sample containing time elements; p (X) ij ) The probability that the time element in the text clause is the case time; q (X) ij ) Probability of non-concurrence time for time element in text clause, and P (X ij )+Q(X ij ) =1; m is the node number of the hidden layer; n is the number of text clause samples containing time elements; the smaller the H (P, Q) value is, the smaller the error representing the discrimination result is; therefore, whether the judged case sending time is subjected to letter collection can be determined according to the comparison result of the measured error and the set value. For example: and if H (P, Q) is smaller than the set value, acquiring the message, considering the time element in the judged text clause as the case sending time, and if H (P, Q) is smaller than the set value, not acquiring the message, and determining whether the time element without acquiring the message is the case sending time through manual judgment.
Because the Chinese characters exist in the extracted time elements, for example, the '7 th month, 21 th day, 7 th day and the' 49 th day, 25 th day and 6 th day 'in 2020' are all time elements representing time, but the '7 th day and the' 49 th day 'in 6 th day' are two different time representation modes, if not unified, digital archiving is not facilitated, and inconvenience of police analysis and case handling is increased. Therefore, in order to facilitate uniform processing of the issue time, it is necessary to perform standardized processing of the issue time. In a further embodiment, step 4 is further:
Step 41: directly determining a time element of 'year, month, day and time' through a regular expression of '0-9 ] {4} year', '0-9 ] {1,2} month', '0-9 ] {1,2} day', '0-9 ] {0,2} [ time|point ]' and executing the next step;
step 42: if "night", "afternoon", "evening" appears in the time element text, and the number of hours through "[0-9] {0,2} [ hour |Point ]") is less than 12, the number of hours is increased by 12;
step 43: if the 'day' element in the time element is missing and the time element contains 'Jing', 'yesterday' and 'before', the corresponding 'day' element is obtained by reasoning 0 day, 1 day and 2 days forward according to the alarm time;
step 44: if a single element in the time element 'year, month, day and hour' is missing, filling the corresponding element in the previous time element;
step 45: carrying out standardization treatment on the time elements to form standard issuing time of a 10-bit digital format of yyyymmddhh; bits 1-4 represent the "year" time, bits 5-6 represent the "month" time, bits 7-8 represent the "day" time, and bits 9-10 represent the "hour" time.
Therefore, the case time "7.7.21.7.7.7.7.9 minutes" in 2020, 25.6.49 minutes "in 2020, and" 10 minutes "in 7.25.8.8 hours" in 2020 in the police text. After each time normalization, "2020072119", "2020072506", "2020072508" were obtained.
In alert text, the infringed matter may correspond to a plurality of time points, and each time point may be different in interval, and may be separated by a few days, and may be separated by a few hours. If the time of the case is determined by all time points, the problem of the case with shorter interval time is lost in relevance, and the case breaking difficulty of policemen is increased. Therefore, in a further embodiment, by combining the case time of the standardized processing, combining case time points with shorter time intervals, or combining case time points with obvious relevance, and marking the combined case time, it is convenient for people and police to analyze the case. The specific process of step 5 is therefore:
step 51: judging whether two adjacent time elements appear in the same text clause, when the two adjacent time elements appear in the same text clause and the former time is earlier than the latter time, combining the standard issuing time corresponding to the two time elements to form an issuing time period, otherwise, executing the next step;
step 52: calculating the small time difference of two adjacent time elements, when the two adjacent time elements differ by less than 24 hours and the former time is earlier than the latter time, combining the standard issuing time corresponding to the two time elements to form an issuing time period, otherwise, executing the next step;
Step 53: searching keywords in a text clause, when keywords of start and start exist in the text clause corresponding to the previous time element in two adjacent time elements, keywords of end and end exist in the text clause corresponding to the next time element, and when the previous time is earlier than the next time, merging the standard case sending time corresponding to the two time elements to form a case sending time period, otherwise, executing the next step;
step 54: positioning standard issuing time corresponding to the remaining time elements as issuing time points;
step 55: marking the issuing time period and the issuing time point according to the time sequence; for example, it may be labeled "first time of occurrence", "second time of occurrence", "third time of occurrence", etc.
The case time of 7 days and 7 days later in the case of 2020 is the case time point, and the case time period is formed by combining 49 minutes of 25 days and 6 days in 2020 and 49 minutes in 25 days and 10 minutes in 7 months and 25 days and 8 hours in 2020, and can be indicated as a first case time of 2020072119 and a second case time of 2020072506-2020072508 after marking. The method not only can facilitate digital archiving, but also can avoid the time of the case from being memorized by the police after the case is marked, and the sequence of the case can be known by the police through the marked time of the case, so that the convenience of the police in processing the case is greatly improved, and the case can be rapidly and accurately analyzed by the police.
Example 2: the embodiment 2 of the invention provides a standard case issuing time extraction system of alert texts, which comprises a first module, a second module, a third module, a fourth module and a fifth module;
the first module is used for sequentially extracting time elements in the warning text in a named entity identification mode;
the second module is used for dividing the warning text into a plurality of text clauses and constructing key value pairs of the text clauses and the time elements;
the third module is used for establishing and training a time identification model, and identifying the expression content in the text clause through the time identification model so as to determine the time of the case;
the fourth module is used for carrying out standardization processing on the determined issuing time;
the fifth module is used for merging the standardized issuing time and further marking the merged issuing time;
the first module, the second module, the third module, the fourth module and the fifth module of the standard case sending time extraction system of the alert text are used for implementing the standard time extraction method of the alert text in embodiment 1, so that the standard time extraction method of the alert text has the technical effects that the standard case sending time extraction system of the alert text also has the following technical effects.
Example 3: embodiment 3 of the present invention provides a computer processing system including a memory module; the storage module stores a computer program for implementing the standard case time extraction method of the alert text according to any one of the embodiments; the computer processing system can be used for realizing the standard time extraction method of the alert text, so the standard time extraction method of the alert text has the technical effects, and the computer processing system also has the technical effects.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (6)
1. The method for extracting the standard case sending time of the alert text is characterized by comprising the following steps of:
step 1: sequentially extracting time elements in the warning text in a named entity identification mode;
step 2: dividing the alert text into a plurality of text clauses, and constructing key value pairs of the text clauses and the time elements;
Step 3: establishing and training a case time identification model, and identifying the expression content in the text clause through the case time identification model to determine the case time;
step 4: carrying out standardization processing on the determined issuing time;
step 5: combining the standardized issuing time and further marking the combined issuing time; the step 1 adopts a regular expression to extract time elements, and the specific process is as follows:
step 11: firstly, removing the content in brackets in the warning text, and removing time element interference information in the bracket content;
step 12: then extracting time elements in the text by using a regular expression, wherein the regular expression is as follows: ([ 0-9] {4} years)? ([ 0-9] {1,2} month)? ([ 0-9] {1,2} day)? (before yesterday? [ \\ u4E00- \\u9FA5]? (night |early|morning|afternoon|evening)? [ \\ u4E00- \\u9FA5]? ([ 0-9] {0,2} [ time |Point ])? ([ 0-9] {0,2} points);
wherein:
([ 0-9] {4} years), representing four digits plus "year" for matching year time;
([ 0-9] {1,2} month), representing one or two digits plus "month" for matching month time;
([ 0-9] {1,2} day), representing one or two digits plus "day" for matching time of day;
(Jinyesterday) front) [ \\\u 4E00- \\u9FA5], to match the relative date descriptions of "today", "yesterday", and "previous day";
(night|early|morning|afternoon|evening) [ (u 4E00-_9FA5], period descriptions for matching "evening", "morning", "afternoon";
([ 0-9] {1,2} [ time|point ]), representing one or two digits plus "time" or "point" for matching a specific hour;
([ 0-9] {1,2} minutes), representing one or two digits plus "minutes" for matching a particular minute;
the step 2 is further as follows:
firstly, sequentially arranging the extracted time elements according to the sequence of the occurrence of the alarm text, and determining the first time as alarm time;
then, the warning text is segmented into a plurality of text clauses through punctuation regular matching;
finally, determining a text clause where a time element except the alarm time is located; if the text clause contains a time element and the left and right clauses of the text clause do not contain the time element, combining the left and right text clauses which do not contain the time element with the clause containing the time element to form a new text clause; constructing key value pairs of which the time elements are in one-to-one correspondence with the text clauses;
The pattern sending time identification model in the step 3 comprises a pre-training model and a judging model;
the training data in the database is derived from historical warning condition data of manual marking of the case sending time, and the case sending time in the warning condition text is determined by comparing text clauses containing time elements with the training data; and automatically marking the text clause data after discrimination and supplementing the text clause data into a database;
the judging model comprises an input layer, a hidden layer and an output layer; the input layer is a text clause containing time elements for segmenting the warning text, and the node number is the number of the text clauses; the hidden layer is the data newly added into the database in the pre-training process and the original data in the database; the output layer is used for determining whether the time element in the text clause is the case time or not through comparison, and the number of nodes of the output layer is equal to the number of the text clauses to be judged; aiming at the situation that the judging process exceeds the data extension of the training database, the input text clauses are manually processed, the processed data are fed into the database, and the data of the hidden layer are gradually increased along with the increase of the training process;
The judging model carries out error calculation on the judging result:
wherein X is ij For a text clause sample containing time elements, P (X ij ) Q (X) is the probability that the time element in the text clause is the occurrence time ij ) Probability of non-concurrence time for time element in text clause, and P (X ij )+Q(X ij ) =1, m is the number of nodes of the hidden layer, N is the number of text clause samples containing time elements; the smaller the H (P, Q) value, the smaller the error representing the discrimination result.
2. The method for extracting standard case time of alert text according to claim 1, wherein the step 4 is further:
step 41: directly determining a time element of 'year, month, day and time' through a regular expression of '0-9 ] {4} year', '0-9 ] {1,2} month', '0-9 ] {1,2} day', '0-9 ] {0,2} [ time|point ]' and executing the next step;
step 42: if "night", "afternoon", "evening" appears in the time element text, and the number of hours through "[0-9] {0,2} [ hour |Point ]") is less than 12, the number of hours is increased by 12;
step 43: if the 'day' element in the time element is missing and the time element contains 'Jing', 'yesterday' and 'before', the corresponding 'day' element is obtained by reasoning 0 day, 1 day and 2 days forward according to the alarm time;
Step 44: if a single element in the time element 'year, month, day and hour' is missing, filling the corresponding element in the previous time element;
step 45: the time elements are normalized to form a standard time of issuance in a 10-digit format of "yyyymmddhh".
3. The method for extracting standard case time of alert text according to claim 1, wherein the step 5 is further:
step 51: judging whether two adjacent time elements appear in the same text clause, when the two adjacent time elements appear in the same text clause and the former time is earlier than the latter time, combining the standard issuing time corresponding to the two time elements to form an issuing time period, otherwise, executing the next step;
step 52: calculating the small time difference of two adjacent time elements, when the two adjacent time elements differ by less than 24 hours and the former time is earlier than the latter time, combining the standard issuing time corresponding to the two time elements to form an issuing time period, otherwise, executing the next step;
step 53: searching keywords in a text clause, when keywords of start and start exist in the text clause corresponding to the previous time element in two adjacent time elements, keywords of end and end exist in the text clause corresponding to the next time element, and when the previous time is earlier than the next time, merging the standard case sending time corresponding to the two time elements to form a case sending time period, otherwise, executing the next step;
Step 54: positioning standard issuing time corresponding to the remaining time elements as issuing time points;
step 55: and marking the issuing time period and the issuing time point according to the time sequence.
4. The utility model provides a standard case time extraction system of alert condition text which characterized in that includes:
the first module is used for sequentially extracting time elements in the warning text in a named entity identification mode;
the second module is used for segmenting the warning text into a plurality of text clauses and constructing key value pairs of the text clauses and the time elements;
a third module for establishing and training a time identification model, and identifying the expression content in the text clause through the time identification model to determine the time;
a fourth module for performing normalization processing on the determined issue time;
a fifth module for merging the standardized issuing time and further marking the merged issuing time; the first module extracts time elements by adopting a regular expression, firstly removes the content in brackets in the warning text, removes the time element interference information in the bracket content, and then extracts the time elements in the text by utilizing the regular expression, wherein the regular expression is as follows:
([ 0-9] {4} years)? ([ 0-9] {1,2} month)? ([ 0-9] {1,2} day)? (before yesterday? [ \\ u4E00- \\u9FA5]? (night |early|morning|afternoon|evening)? [ \\ u4E00- \\u9FA5]? ([ 0-9] {0,2} [ time |Point ])? ([ 0-9] {0,2} points);
wherein:
([ 0-9] {4} years), representing four digits plus "year" for matching year time;
([ 0-9] {1,2} month), representing one or two digits plus "month" for matching month time;
([ 0-9] {1,2} day), representing one or two digits plus "day" for matching time of day;
(Jinyesterday) front) [ \\\u 4E00- \\u9FA5], to match the relative date descriptions of "today", "yesterday", and "previous day";
(night|early|morning|afternoon|evening) [ (u 4E00-_9FA5], period descriptions for matching "evening", "morning", "afternoon";
([ 0-9] {1,2} [ time|point ]), representing one or two digits plus "time" or "point" for matching a specific hour;
([ 0-9] {1,2} minutes), representing one or two digits plus "minutes" for matching a particular minute;
the second module firstly sequentially arranges the extracted time elements according to the sequence of the occurrence in the warning text, and determines the first time as the warning time; then, the warning text is segmented into a plurality of text clauses through punctuation regular matching; finally, determining a text clause where a time element except the alarm time is located; if the text clause contains a time element and the left and right clauses of the text clause do not contain the time element, combining the left and right text clauses which do not contain the time element with the clause containing the time element to form a new text clause; constructing key value pairs of which the time elements are in one-to-one correspondence with the text clauses;
The third module establishes and trains a time identification model, wherein the time identification model comprises a pre-training model and a judging model;
firstly, a database is established in the pre-training model, training data in the database is derived from historical warning condition data of manually marked case-sending time, and the case-sending time in the warning condition text is determined by comparing text clauses containing time elements with the training data; and automatically marking the text clause data after discrimination and supplementing the text clause data into a database;
the judging model comprises an input layer, a hiding layer and an output layer; the input layer is a text clause containing time elements for segmenting the warning text, and the node number is the number of the text clauses; the hidden layer is the data newly added into the database in the pre-training process and the original data in the database; the output layer is used for determining whether the time element in the text clause is the case time or not through comparison, and the number of nodes of the output layer is equal to the number of the text clauses to be judged; aiming at the situation that the judging process exceeds the data extension of the training database, the input text clauses are manually processed, the processed data are fed into the database, and the data of the hidden layer are gradually increased along with the increase of the training process;
The judging model carries out error calculation on the judging result:
wherein X is ij For a text clause sample containing time elements, P (X ij ) Q (X) is the probability that the time element in the text clause is the occurrence time ij ) For dividing textProbability of non-occurrence time of time element in sentence, and P (X ij )+Q(X ij ) =1, m is the number of nodes of the hidden layer, N is the number of text clause samples containing time elements; the smaller the H (P, Q) value, the smaller the error representing the discrimination result.
5. The police text standard case time extraction system of claim 4, wherein,
the fourth module directly determines time elements of year, month, day and time through regular expressions of "[0-9] {4} year", "[0-9] {1,2} month", "[0-9] {1,2} day", "[0-9] {0,2} [ time|point ]"; "night", "afternoon", "evening" occurs in the time element text, and the number of hours through "[0-9] {0,2} [ hour |point ]" is less than 12, the number of hours is added to 12; when the 'day' element in the time element is missing and the time element contains 'Jing', 'yesterday' and 'before', the time element is inferred to be 0 day, 1 day and 2 days forward according to the alarm time, and the corresponding 'day' element is obtained; when a single element in the time element 'year, month, day and hour' is missing, filling corresponding elements in the previous time element; finally, the time elements are standardized to form the standard issuing time of the 10-bit digital format of yyyymmddhh;
The fifth module firstly judges whether two adjacent time elements appear in the same text clause, and when the two adjacent time elements appear in the same text clause and the former time is earlier than the latter time, the standard case sending time corresponding to the two time elements is combined to form a case sending time period; calculating the small time difference of two adjacent time elements, and when the two adjacent time elements differ by less than 24 hours and the former time is earlier than the latter time, combining the standard issuing time corresponding to the two time elements to form an issuing time period; searching keywords in a text clause, and when keywords of 'start', 'start' exist in the text clause corresponding to the previous time element in two adjacent time elements, keywords of 'end', 'end' exist in the text clause corresponding to the next time element, and the previous time is earlier than the next time, merging the standard case sending time corresponding to the two time elements to form a case sending time period; positioning standard issuing time corresponding to the remaining time elements as issuing time points; and finally, marking the issuing time period and the issuing time point according to the time sequence.
6. A computer processing system comprising a storage module, wherein the storage module stores a computer program for implementing the standard case time extraction method of alert text according to any one of claims 1 to 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011195667.3A CN112541075B (en) | 2020-10-30 | 2020-10-30 | Standard case sending time extraction method and system for alert text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011195667.3A CN112541075B (en) | 2020-10-30 | 2020-10-30 | Standard case sending time extraction method and system for alert text |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112541075A CN112541075A (en) | 2021-03-23 |
CN112541075B true CN112541075B (en) | 2024-04-05 |
Family
ID=75013660
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011195667.3A Active CN112541075B (en) | 2020-10-30 | 2020-10-30 | Standard case sending time extraction method and system for alert text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112541075B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116108163B (en) * | 2023-04-04 | 2023-06-27 | 之江实验室 | Text matching method, device, equipment and storage medium |
Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108305050A (en) * | 2018-02-08 | 2018-07-20 | 贵州小爱机器人科技有限公司 | Information of reporting a case to the security authorities and the extracting method of service requirement information, device, equipment and medium |
CN108920461A (en) * | 2018-06-26 | 2018-11-30 | 武大吉奥信息技术有限公司 | A kind of polymorphic type and entity abstracting method and device containing complex relationship |
CN109472419A (en) * | 2018-11-16 | 2019-03-15 | 中山大学 | Method for building up, device and the storage medium of alert prediction model based on space-time |
CN110287292A (en) * | 2019-07-04 | 2019-09-27 | 科大讯飞股份有限公司 | A kind of judge's measurement of penalty irrelevance prediction technique and device |
CN110941702A (en) * | 2019-11-26 | 2020-03-31 | 北京明略软件***有限公司 | Retrieval method and device for laws and regulations and laws and readable storage medium |
CN110990562A (en) * | 2019-10-29 | 2020-04-10 | 新智认知数字科技股份有限公司 | Alarm classification method and system |
CN111047092A (en) * | 2019-12-11 | 2020-04-21 | 深圳前海环融联易信息科技服务有限公司 | Dispute case victory rate prediction method and device, computer equipment and storage medium |
CN111062834A (en) * | 2019-12-11 | 2020-04-24 | 深圳前海环融联易信息科技服务有限公司 | Dispute case entity identification method and device, computer equipment and storage medium |
CN111260223A (en) * | 2020-01-17 | 2020-06-09 | 山东省计算中心(国家超级计算济南中心) | Intelligent identification and early warning method, system, medium and equipment for trial and judgment risk |
WO2020114373A1 (en) * | 2018-12-07 | 2020-06-11 | 北京国双科技有限公司 | Method and apparatus for realizing element recognition in judicial document |
CN111274804A (en) * | 2020-01-17 | 2020-06-12 | 珠海市新德汇信息技术有限公司 | Case information extraction method based on named entity recognition |
CN111680512A (en) * | 2020-05-11 | 2020-09-18 | 上海阿尔卡特网络支援***有限公司 | Named entity recognition model, telephone exchange switching extension method and system |
CN111783420A (en) * | 2020-06-19 | 2020-10-16 | 上海交通大学 | Anti-complaint book element extraction method, system, medium and device based on BERT model |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10489439B2 (en) * | 2016-04-14 | 2019-11-26 | Xerox Corporation | System and method for entity extraction from semi-structured text documents |
US10957171B2 (en) * | 2016-07-11 | 2021-03-23 | Google Llc | Methods and systems for providing event alerts |
US11170016B2 (en) * | 2017-07-29 | 2021-11-09 | Splunk Inc. | Navigating hierarchical components based on an expansion recommendation machine learning model |
US20200126174A1 (en) * | 2018-08-10 | 2020-04-23 | Rapidsos, Inc. | Social media analytics for emergency management |
-
2020
- 2020-10-30 CN CN202011195667.3A patent/CN112541075B/en active Active
Patent Citations (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108305050A (en) * | 2018-02-08 | 2018-07-20 | 贵州小爱机器人科技有限公司 | Information of reporting a case to the security authorities and the extracting method of service requirement information, device, equipment and medium |
CN108920461A (en) * | 2018-06-26 | 2018-11-30 | 武大吉奥信息技术有限公司 | A kind of polymorphic type and entity abstracting method and device containing complex relationship |
CN109472419A (en) * | 2018-11-16 | 2019-03-15 | 中山大学 | Method for building up, device and the storage medium of alert prediction model based on space-time |
WO2020114373A1 (en) * | 2018-12-07 | 2020-06-11 | 北京国双科技有限公司 | Method and apparatus for realizing element recognition in judicial document |
CN110287292A (en) * | 2019-07-04 | 2019-09-27 | 科大讯飞股份有限公司 | A kind of judge's measurement of penalty irrelevance prediction technique and device |
CN110990562A (en) * | 2019-10-29 | 2020-04-10 | 新智认知数字科技股份有限公司 | Alarm classification method and system |
CN110941702A (en) * | 2019-11-26 | 2020-03-31 | 北京明略软件***有限公司 | Retrieval method and device for laws and regulations and laws and readable storage medium |
CN111047092A (en) * | 2019-12-11 | 2020-04-21 | 深圳前海环融联易信息科技服务有限公司 | Dispute case victory rate prediction method and device, computer equipment and storage medium |
CN111062834A (en) * | 2019-12-11 | 2020-04-24 | 深圳前海环融联易信息科技服务有限公司 | Dispute case entity identification method and device, computer equipment and storage medium |
CN111260223A (en) * | 2020-01-17 | 2020-06-09 | 山东省计算中心(国家超级计算济南中心) | Intelligent identification and early warning method, system, medium and equipment for trial and judgment risk |
CN111274804A (en) * | 2020-01-17 | 2020-06-12 | 珠海市新德汇信息技术有限公司 | Case information extraction method based on named entity recognition |
CN111680512A (en) * | 2020-05-11 | 2020-09-18 | 上海阿尔卡特网络支援***有限公司 | Named entity recognition model, telephone exchange switching extension method and system |
CN111783420A (en) * | 2020-06-19 | 2020-10-16 | 上海交通大学 | Anti-complaint book element extraction method, system, medium and device based on BERT model |
Non-Patent Citations (2)
Title |
---|
"A Survey on Deep Learning for Named Entity Recognition";Jing Li;《IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING》;20201030;第50-70页 * |
"基于情景相似度的突发事件情报感知实现方法";杨峰 等;《情报学报》;20190531;第525-533页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112541075A (en) | 2021-03-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111259160B (en) | Knowledge graph construction method, device, equipment and storage medium | |
CN113191148B (en) | Rail transit entity identification method based on semi-supervised learning and clustering | |
CN110175334B (en) | Text knowledge extraction system and method based on custom knowledge slot structure | |
CN110609998A (en) | Data extraction method of electronic document information, electronic equipment and storage medium | |
CN111061882A (en) | Knowledge graph construction method | |
CN114090736A (en) | Enterprise industry identification system and method based on text similarity | |
CN115168345B (en) | Database classification method, system, device and storage medium | |
CN111859070A (en) | Mass internet news cleaning system | |
CN112069383A (en) | News text event and time extraction and normalization system for event tracking | |
CN112541075B (en) | Standard case sending time extraction method and system for alert text | |
CN114860882A (en) | Fair competition review auxiliary method based on text classification model | |
CN115964476A (en) | Intelligent key information extraction method for securitized product report | |
CN112328792A (en) | Optimization method for recognizing credit events based on DBSCAN clustering algorithm | |
CN111177401A (en) | Power grid free text knowledge extraction method | |
CN113515587B (en) | Target information extraction method, device, computer equipment and storage medium | |
CN112990110B (en) | Method for extracting key information from research report and related equipment | |
CN109960707B (en) | College recruitment data acquisition method and system based on artificial intelligence | |
CN113111660A (en) | Data processing method, device, equipment and storage medium | |
CN111859032A (en) | Method and device for detecting character-breaking sensitive words of short message and computer storage medium | |
CN109542845B (en) | Text metadata extraction method based on keyword expression | |
CN110765107A (en) | Question type identification method and system based on digital coding | |
CN115994531A (en) | Multi-dimensional text comprehensive identification method | |
CN114298041A (en) | Network security named entity identification method and identification device | |
CN112597763A (en) | Method and device for extracting and displaying judicial literature information in association manner and storage medium | |
CN113270092A (en) | Scheduling voice keyword extraction method based on LDA algorithm |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |