CN112541075B - Standard case sending time extraction method and system for alert text - Google Patents

Standard case sending time extraction method and system for alert text Download PDF

Info

Publication number
CN112541075B
CN112541075B CN202011195667.3A CN202011195667A CN112541075B CN 112541075 B CN112541075 B CN 112541075B CN 202011195667 A CN202011195667 A CN 202011195667A CN 112541075 B CN112541075 B CN 112541075B
Authority
CN
China
Prior art keywords
time
text
elements
clause
day
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011195667.3A
Other languages
Chinese (zh)
Other versions
CN112541075A (en
Inventor
叶恺翔
吕晓宝
王坚
胡祥月
宋剑锋
王元兵
王海荣
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sugon Nanjing Research Institute Co ltd
Original Assignee
Sugon Nanjing Research Institute Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sugon Nanjing Research Institute Co ltd filed Critical Sugon Nanjing Research Institute Co ltd
Priority to CN202011195667.3A priority Critical patent/CN112541075B/en
Publication of CN112541075A publication Critical patent/CN112541075A/en
Application granted granted Critical
Publication of CN112541075B publication Critical patent/CN112541075B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Tourism & Hospitality (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Databases & Information Systems (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The invention discloses a standard case sending time extraction method and system of police text, and belongs to the technical field of police text extraction. The method comprises the following steps: sequentially extracting time elements in the warning text in a named entity identification mode; dividing the alert text into a plurality of text clauses, and constructing key value pairs of the text clauses and the time elements; establishing and training a case time identification model, and identifying the expression content in the text clause through the case time identification model to determine the case time; carrying out standardization processing on the determined issuing time; and merging the standardized issuing time, and further marking the merged issuing time. The invention adds the case issuing time identification model based on the named entity identification time element, accurately identifies and extracts the case issuing time information, and provides service convenience and support for the rapid and accurate analysis and verification of the police.

Description

Standard case sending time extraction method and system for alert text
Technical Field
The invention belongs to the technical field of police condition text extraction, and particularly relates to a method and a system for extracting standard case sending time of police condition text.
Background
The time element extraction technology in the text is mature, and the method such as a named entity recognition task, a regular expression, a sequence labeling model and the like can achieve good effects. The regular expression matches the text based on a fixed time expression template; the sequence labeling model relies on text data labeled in advance, and the machine learns the characteristics of the time elements in the text sequence through manual labels.
However, in the police system, how to distinguish the attribute of each time element in the police text and convert the attribute into a standard time format to make reasoning about a plurality of time relations is not related to the current technology. The time elements in the alert text are divided into alarm time, case sending time, other background time and the like. Wherein the concurrency time is a time period or a time point under a specific scene. At present, the existing model in the prior art is difficult to accurately extract the case time in the police text, so that the service pressure of the police is greatly increased.
Disclosure of Invention
The invention provides a method and a system for extracting standard case issuing time of alert texts, which are used for solving the problems in the prior art.
In order to achieve the above purpose, the present invention adopts the following technical scheme:
a standard case sending time extraction method of alert text comprises the following steps:
step 1: sequentially extracting time elements in the warning text in a named entity identification mode;
step 2: dividing the alert text into a plurality of text clauses, and constructing key value pairs of the text clauses and the time elements;
step 3: establishing and training a case time identification model, and identifying the expression content in the text clause through the case time identification model to determine the case time;
step 4: carrying out standardization processing on the determined issuing time;
step 5: and merging the standardized issuing time, and further marking the merged issuing time.
In a further embodiment, the step 1 adopts a regular expression to extract the time element, and the specific process is as follows:
step 11: firstly, removing the content in brackets in the warning text, and removing time element interference information in the bracket content;
step 12: then extracting time elements in the text by using a regular expression, wherein the regular expression is as follows: ([ 0-9] {4} years)? ([ 0-9] {1,2} month)? ([ 0-9] {1,2} day)? (before yesterday? [ \\ u4E00- \\u9FA5]? (night |early|morning|afternoon|evening)? [ \\ u4E00- \\u9FA5]? ([ 0-9] {0,2} [ time |Point ])? ([ 0-9] {0,2} points);
Wherein:
([ 0-9] {4} years), representing four digits plus "year" for matching year time;
([ 0-9] {1,2} month), representing one or two digits plus "month" for matching month time;
([ 0-9] {1,2} day), representing one or two digits plus "day" for matching time of day;
(Jinyesterday) front) [ \\\u 4E00- \\u9FA5], to match the relative date descriptions of "today", "yesterday", and "previous day";
(night|early|morning|afternoon|evening) [ (u 4E00-_9FA5], period descriptions for matching "evening", "morning", "afternoon";
([ 0-9] {1,2} [ time|point ]), representing one or two digits plus "time" or "point" for matching a specific hour;
([ 0-9] {1,2} minutes), representing one or two digits plus "minutes" are used to match a particular minute.
In a further embodiment, the step 2 is further:
firstly, sequentially arranging the extracted time elements according to the sequence of the occurrence of the alarm text, and determining the first time as alarm time;
then, the warning text is segmented into a plurality of text clauses through punctuation regular matching;
finally, determining a text clause where a time element except the alarm time is located; if the text clause contains a time element and the left and right clauses of the text clause do not contain the time element, combining the left and right text clauses which do not contain the time element with the clause containing the time element to form a new text clause; and constructing key value pairs of which the time elements are in one-to-one correspondence with the text clauses.
In a further embodiment, the generating time identification model in the step 3 includes a pre-training model and a discrimination model;
the training data in the database is derived from historical warning condition data of manual marking of the case sending time, and the case sending time in the warning condition text is determined by comparing text clauses containing time elements with the training data; and automatically marking the text clause data after discrimination and supplementing the text clause data into a database;
the judging model comprises an input layer, a hidden layer and an output layer; the input layer is a text clause containing time elements for segmenting the warning text, and the node number is the number of the text clauses; the hidden layer is the data newly added into the database in the pre-training process and the original data in the database; the output layer is used for determining whether the time element in the text clause is the case time or not through comparison, and the number of nodes of the output layer is equal to the number of the text clauses to be judged; aiming at the situation that the judging process has data extension exceeding the training database, the input text clauses are manually processed, the processed data are fed into the database, and the data of the hidden layer are gradually increased along with the increase of the training process;
The judging model carries out error calculation on the judging result:
wherein X is ij For a text clause sample containing time elements, P (X ij ) Q (X) is the probability that the time element in the text clause is the occurrence time ij ) Probability of non-concurrence time for time element in text clause, and P (X ij )+Q(X ij ) =1, m is the number of nodes of the hidden layer, N is the number of text clause samples containing time elements; the smaller the H (P, Q) value, the smaller the error representing the discrimination result.
In a further embodiment, the step 4 is further:
step 41: directly determining a time element of 'year, month, day and time' through a regular expression of '0-9 ] {4} year', '0-9 ] {1,2} month', '0-9 ] {1,2} day', '0-9 ] {0,2} [ time|point ]' and executing the next step;
step 42: if "night", "afternoon", "evening" appears in the time element text, and the number of hours through "[0-9] {0,2} [ hour |Point ]") is less than 12, the number of hours is increased by 12;
step 43: if the 'day' element in the time element is missing and the time element contains 'Jing', 'yesterday' and 'before', the corresponding 'day' element is obtained by reasoning 0 day, 1 day and 2 days forward according to the alarm time;
step 44: if a single element in the time element 'year, month, day and hour' is missing, filling the corresponding element in the previous time element;
Step 45: the time elements are normalized to form a standard time of issuance in a 10-digit format of "yyyymmddhh".
In a further embodiment, the step 5 is further:
step 51: judging whether two adjacent time elements appear in the same text clause, when the two adjacent time elements appear in the same text clause and the former time is earlier than the latter time, combining the standard issuing time corresponding to the two time elements to form an issuing time period, otherwise, executing the next step;
step 52: calculating the small time difference of two adjacent time elements, when the two adjacent time elements differ by less than 24 hours and the former time is earlier than the latter time, combining the standard issuing time corresponding to the two time elements to form an issuing time period, otherwise, executing the next step;
step 53: searching keywords in a text clause, when keywords of start and start exist in the text clause corresponding to the previous time element in two adjacent time elements, keywords of end and end exist in the text clause corresponding to the next time element, and when the previous time is earlier than the next time, merging the standard case sending time corresponding to the two time elements to form a case sending time period, otherwise, executing the next step;
Step 54: positioning standard issuing time corresponding to the remaining time elements as issuing time points;
step 55: and marking the issuing time period and the issuing time point according to the time sequence.
A standard case time extraction system for alert text, comprising:
the first module is used for sequentially extracting time elements in the warning text in a named entity identification mode;
the second module is used for segmenting the warning text into a plurality of text clauses and constructing key value pairs of the text clauses and the time elements;
a third module for establishing and training a time identification model, and identifying the expression content in the text clause through the time identification model to determine the time;
a fourth module for performing normalization processing on the determined issue time;
and a fifth module for merging the standardized issuing time and further marking the merged issuing time.
In a further embodiment, the first module extracts the time element by using a regular expression, firstly removing the content in brackets in the alert text, removing the time element interference information in the bracket content, and then extracting the time element in the text by using the regular expression, where the regular expression is:
([ 0-9] {4} years)? ([ 0-9] {1,2} month)? ([ 0-9] {1,2} day)? (before yesterday? [ \\ u4E00- \\u9FA5]? (night |early|morning|afternoon|evening)? [ \\ u4E00- \\u9FA5]? ([ 0-9] {0,2} [ time |Point ])? ([ 0-9] {0,2} points);
wherein:
([ 0-9] {4} years), representing four digits plus "year" for matching year time;
([ 0-9] {1,2} month), representing one or two digits plus "month" for matching month time;
([ 0-9] {1,2} day), representing one or two digits plus "day" for matching time of day;
(Jinyesterday) front) [ \\\u 4E00- \\u9FA5], to match the relative date descriptions of "today", "yesterday", and "previous day";
(night|early|morning|afternoon|evening) [ (u 4E00-_9FA5], period descriptions for matching "evening", "morning", "afternoon";
([ 0-9] {1,2} [ time|point ]), representing one or two digits plus "time" or "point" for matching a specific hour;
([ 0-9] {1,2} minutes), representing one or two digits plus "minutes" for matching a particular minute;
the second module firstly sequentially arranges the extracted time elements according to the sequence of the occurrence in the warning text, and determines the first time as the warning time; then, the warning text is segmented into a plurality of text clauses through punctuation regular matching; finally, determining a text clause where a time element except the alarm time is located; if the text clause contains a time element and the left and right clauses of the text clause do not contain the time element, combining the left and right text clauses which do not contain the time element with the clause containing the time element to form a new text clause; and constructing key value pairs of which the time elements are in one-to-one correspondence with the text clauses.
The third module establishes and trains a time identification model, wherein the time identification model comprises a pre-training model and a judging model;
the training data in the database is derived from historical warning condition data of manual marking of the case sending time, and the case sending time in the warning condition text is determined by comparing text clauses containing time elements with the training data; and automatically marking the text clause data after discrimination and supplementing the text clause data into a database;
the judging model comprises an input layer, a hidden layer and an output layer; the input layer is a text clause containing time elements for segmenting the warning text, and the node number is the number of the text clauses; the hidden layer is the data newly added into the database in the pre-training process and the original data in the database; the output layer is used for determining whether the time element in the text clause is the case time or not through comparison, and the number of nodes of the output layer is equal to the number of the text clauses to be judged; aiming at the situation that the judging process has data extension exceeding the training database, the input text clauses are manually processed, the processed data are fed into the database, and the data of the hidden layer are gradually increased along with the increase of the training process;
The judging model carries out error calculation on the judging result:
wherein X is ij For a text clause sample containing time elements, P (X ij ) Q (X) is the probability that the time element in the text clause is the occurrence time ij ) Probability of non-concurrence time for time element in text clause, and P (X ij )+Q(X ij ) =1, m is the number of nodes of the hidden layer, N is the number of text clause samples containing time elements; the smaller the H (P, Q) value, the smaller the error representing the discrimination result.
In a further embodiment, the fourth module first determines the time element "year, month, day, time" directly from the regular expressions "[0-9] {4} year", "[0-9] {1,2} month", "[0-9] {1,2} day", "[0-9] {0,2} [ time |point ]"; "night", "afternoon", "evening" occurs in the time element text, and the number of hours through "[0-9] {0,2} [ hour |point ]" is less than 12, the number of hours is added to 12; when the 'day' element in the time element is missing and the time element contains 'Jing', 'yesterday' and 'before', the time element is inferred to be 0 day, 1 day and 2 days forward according to the alarm time, and the corresponding 'day' element is obtained; when a single element in the time element 'year, month, day and hour' is missing, filling corresponding elements in the previous time element; finally, the time elements are standardized to form the standard issuing time of the 10-bit digital format of yyyymmddhh;
The fifth module firstly judges whether two adjacent time elements appear in the same text clause, and when the two adjacent time elements appear in the same text clause and the former time is earlier than the latter time, the standard case sending time corresponding to the two time elements is combined to form a case sending time period; calculating the small time difference of two adjacent time elements, and when the two adjacent time elements differ by less than 24 hours and the former time is earlier than the latter time, combining the standard issuing time corresponding to the two time elements to form an issuing time period; searching keywords in a text clause, and when keywords of 'start', 'start' exist in the text clause corresponding to the previous time element in two adjacent time elements, keywords of 'end', 'end' exist in the text clause corresponding to the next time element, and the previous time is earlier than the next time, merging the standard case sending time corresponding to the two time elements to form a case sending time period; positioning standard issuing time corresponding to the remaining time elements as issuing time points; and finally, marking the issuing time period and the issuing time point according to the time sequence.
A computer processing system comprising a storage module in which is stored a computer program for a standard case time extraction method for alert text in any of the embodiments described above.
The beneficial effects are that: firstly, a case issuing time identification model is added on the basis of named entity identification time elements, case issuing time information is accurately identified and extracted, and service convenience is provided for a police to rapidly analyze and check the police;
secondly, segmenting the warning text into a plurality of text clauses containing time elements, constructing key value pairs of the text clauses and the time elements, and judging whether the time elements in the text clauses are case-issuing time or not by carrying out semantic recognition on the text clauses, so that the situation that the case-issuing time is difficult to identify and extract due to the fact that the content of the warning text is complex is reduced;
and finally, carrying out merging processing on the case issuing time and marking the case issuing time point and the case issuing time period, thereby providing service support for the rapid and accurate analysis of the police condition by the police.
Drawings
FIG. 1 is a flow chart of the standard case issue time extraction of alert text of the present invention.
Fig. 2 is a schematic structural diagram of the discrimination model of the present invention.
Detailed Description
The technical scheme of the invention will be clearly and completely described below with reference to the accompanying drawings and examples. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without any inventive effort, are intended to be within the scope of the invention.
The research shows that the time elements in the warning text are divided into warning time, case issuing time, other background time and the like, and the conventional public security warning system is difficult to distinguish the time element attributes in the warning text and also difficult to realize reasoning of a plurality of time relations so as to accurately identify and extract the case issuing time, and the work generally requires the police to perform manual identification marking, so that the service pressure of the police is greatly increased.
Example 1: as shown in fig. 1, in order to solve the problems existing in the prior art, embodiment 1 of the present invention provides a method for extracting a standard case development time of a warning text, including the following steps:
step 1: sequentially extracting time elements in the warning text in a named entity identification mode;
step 2: dividing the alert text into a plurality of text clauses, and constructing key value pairs of the text clauses and the time elements;
Step 3: establishing and training a case time identification model, and identifying the expression content in the text clause through the case time identification model to determine the case time;
step 4: carrying out standardization processing on the determined issuing time;
step 5: and merging the standardized issuing time, and further marking the merged issuing time.
The police text has complex content, a plurality of time elements and different attributes. For example, the time elements in the alert text are mainly divided into an alarm time, a case issue time, other background time, and the like. These time elements of different properties greatly increase the difficulty of extracting the time of occurrence. For a more detailed description of the embodiments of the present invention, the present application provides a section of simple alert text, "40 minutes at 10 days 27 and 07 in 2020, which is called: 7 months in 2020 and 21 days 7 days later, the bank card of alarming person three (household place: xxx, ID card number: xxx, date of birth: xxx) is stolen and swiped 1, and 100 yuan is lost; the theft of 2 pens from 7 months in 2020 to 6 hours 49 minutes to 7 months to 25 days 8 hours 10 minutes, and the total loss is 200 yuan. The person goes to the bank for loss reporting at 9 am on 7 months and 26 days in 2020. In the text, "40 minutes at 27 and 27 in 2020" is the alarm time, "7 minutes at 21 and 21 in 2020", "49 minutes at 25 and 6 in 2020" to 10 minutes at 25 and 8 in 7 and 25 "are the case time, and the birth date of the alarm person is other background time. The time elements have complex attributes, and the recognition difficulty of the issuing time is greatly increased. In addition, part of the time elements in the alert text are in a non-standard time format, such as 7-point late, which also increases the difficulty of identifying the time elements.
Therefore, in order to accurately determine the attribute of the time element, first, the useful time element in the alert text is accurately extracted. Further, step 1 adopts a regular expression to extract time elements, and the specific process is as follows:
firstly, removing the content in brackets in the warning text, and removing time element interference information in the bracket content. For example, the date of birth of the alarm person in the alarm text usually includes the time of birth of the alarm person, which can interfere with the extraction of the text time elements, and these interfering time elements should be eliminated first;
then, extracting time elements in the text by using a regular expression, wherein the regular expression is as follows:
([ 0-9] {4} years)? ([ 0-9] {1,2} month)? ([ 0-9] {1,2} day)? (before yesterday? [ \\ u4E00- \\u9FA5]? (night |early|morning|afternoon|evening)? [ \\ u4E00- \\u9FA5]? ([ 0-9] {0,2} [ time |Point ])? ([ 0-9] {0,2} points);
wherein:
([ 0-9] {4} years), representing four digits plus "year" for matching year time;
([ 0-9] {1,2} month), representing one or two digits plus "month" for matching month time;
([ 0-9] {1,2} day), representing one or two digits plus "day" for matching time of day;
(Jinyesterday) front) [ \\\u 4E00- \\u9FA5], to match the relative date descriptions of "today", "yesterday", and "previous day"; the wildcards of single Chinese characters can increase the richness of word matching, and are close to the description of alarming people in daily life;
(night|early|morning|afternoon|evening) [ (u 4E00-_9FA5], period descriptions for matching "evening", "morning", "afternoon"; the wildcard character of a single Chinese character can increase the richness of word matching, effectively solve the problem of nonstandard time elements in police text caused by spoken language, avoid directly extracting 7 late points as 7 late points, and ensure the accuracy of time element extraction;
([ 0-9] {1,2} [ time|point ]), representing one or two digits plus "time" or "point" for matching a specific hour;
([ 0-9] {1,2} minutes), representing one or two digits plus "minutes" for matching a particular minute;
the warning text is processed by the regular expression, and the time elements of 40 minutes at 27 days in 07 months in 2020, 7 minutes at 21 days in 2020, 49 minutes at 6 days in 7 months in 2020, 10 minutes at 8 days in 7 months, and 9 am in 26 days in 2020 are extracted sequentially.
Because the police text content is complex, the time elements are more, and if the judgment is directly carried out, the judgment of the case occurrence time is difficult. Therefore, the warning text is segmented into a plurality of text clauses, and semantic recognition is carried out on each text clause to judge whether the time element in the text clause is the case time or not. Thus, in a further embodiment, step 2 is specifically performed as follows:
firstly, sequentially arranging the extracted time elements according to the sequence of the occurrence of the alarm text, and determining the first time as alarm time; the alarm time is "40 minutes of 10 days of 27 months of the 2020 month of the alert text;
then, the warning text is segmented into a plurality of text clauses through punctuation regular matching; at this time, the situation that no time element exists in part of text clauses occurs, and the content of the text clauses without the time element is the content necessary for analyzing whether the time element is the case time, so that the text clauses without the time element cannot be directly removed, the text clauses without the time element are required to be combined into the text clauses with the time element, and the context content of the text clauses with the time element is perfected, so that the case time can be conveniently and accurately judged;
Finally, determining a text clause where a time element except the alarm time is located; if the text clause contains a time element and the left and right clauses of the text clause do not contain the time element, combining the left and right text clauses which do not contain the time element with the clause containing the time element to form a new text clause; and constructing key value pairs corresponding to the time elements and the text clauses one by one, wherein the key value pairs of the time elements and the text clauses constructed by the alert text are as follows:
if the time element is to be judged whether to be the time of the occurrence after the time element is extracted, accurate judgment can be carried out according to the semantics of the text clause where the time element is positioned. In a further embodiment, the recognition model of the time of occurrence is established and trained to recognize the expression content of the text clause, thereby determining whether the time element is the time of occurrence. The pattern generation time identification model comprises a pre-training model and a judging model.
Firstly, a database is established through a pre-training model, and training data in the database is derived from historical warning condition data of artificial marking case development time. Then comparing the text clause containing the time element in the warning text with the training data to determine the case sending time in the warning text; and the text clause data after discrimination is automatically marked and then is supplemented into the database, so that the content of the database is further enriched, and the occurrence time can be rapidly discriminated in the actual discrimination process.
Referring to fig. 2, the discriminant model includes an input layer, a hidden layer, and an output layer; the input layer is a text clause containing time elements for segmenting the warning text, and the node number is the number of the text clauses; the hidden layer is the data newly added into the database in the pre-training process and the original data in the database; the output layer determines whether the time element in the text clause is the case time or not through comparison, and the number of nodes of the output layer is equal to the number of the text clauses to be judged. And performing similarity comparison and discrimination on the text clause containing the time element input by the input layer and the comparison data of the hidden layer, and finally outputting a discrimination result by the output layer. Aiming at the situation that the judging process exceeds the data extension of the training database, the input text clause can be manually processed, and the processed data is fed into the database, so that the data of the hidden layer is gradually increased along with the increase of the training process. Therefore, as the number of text phrases processed by the discrimination model increases, the discrimination difficulty of the discrimination model becomes lower.
Error measurement and calculation are carried out on the discrimination results by adopting a cross entropy loss function for the discrimination structure of the discrimination model so as to increase the accuracy of the discrimination results:
Wherein X is ij A text clause sample containing time elements; p (X) ij ) The probability that the time element in the text clause is the case time; q (X) ij ) Probability of non-concurrence time for time element in text clause, and P (X ij )+Q(X ij ) =1; m is the node number of the hidden layer; n is the number of text clause samples containing time elements; the smaller the H (P, Q) value is, the smaller the error representing the discrimination result is; therefore, whether the judged case sending time is subjected to letter collection can be determined according to the comparison result of the measured error and the set value. For example: and if H (P, Q) is smaller than the set value, acquiring the message, considering the time element in the judged text clause as the case sending time, and if H (P, Q) is smaller than the set value, not acquiring the message, and determining whether the time element without acquiring the message is the case sending time through manual judgment.
Because the Chinese characters exist in the extracted time elements, for example, the '7 th month, 21 th day, 7 th day and the' 49 th day, 25 th day and 6 th day 'in 2020' are all time elements representing time, but the '7 th day and the' 49 th day 'in 6 th day' are two different time representation modes, if not unified, digital archiving is not facilitated, and inconvenience of police analysis and case handling is increased. Therefore, in order to facilitate uniform processing of the issue time, it is necessary to perform standardized processing of the issue time. In a further embodiment, step 4 is further:
Step 41: directly determining a time element of 'year, month, day and time' through a regular expression of '0-9 ] {4} year', '0-9 ] {1,2} month', '0-9 ] {1,2} day', '0-9 ] {0,2} [ time|point ]' and executing the next step;
step 42: if "night", "afternoon", "evening" appears in the time element text, and the number of hours through "[0-9] {0,2} [ hour |Point ]") is less than 12, the number of hours is increased by 12;
step 43: if the 'day' element in the time element is missing and the time element contains 'Jing', 'yesterday' and 'before', the corresponding 'day' element is obtained by reasoning 0 day, 1 day and 2 days forward according to the alarm time;
step 44: if a single element in the time element 'year, month, day and hour' is missing, filling the corresponding element in the previous time element;
step 45: carrying out standardization treatment on the time elements to form standard issuing time of a 10-bit digital format of yyyymmddhh; bits 1-4 represent the "year" time, bits 5-6 represent the "month" time, bits 7-8 represent the "day" time, and bits 9-10 represent the "hour" time.
Therefore, the case time "7.7.21.7.7.7.7.9 minutes" in 2020, 25.6.49 minutes "in 2020, and" 10 minutes "in 7.25.8.8 hours" in 2020 in the police text. After each time normalization, "2020072119", "2020072506", "2020072508" were obtained.
In alert text, the infringed matter may correspond to a plurality of time points, and each time point may be different in interval, and may be separated by a few days, and may be separated by a few hours. If the time of the case is determined by all time points, the problem of the case with shorter interval time is lost in relevance, and the case breaking difficulty of policemen is increased. Therefore, in a further embodiment, by combining the case time of the standardized processing, combining case time points with shorter time intervals, or combining case time points with obvious relevance, and marking the combined case time, it is convenient for people and police to analyze the case. The specific process of step 5 is therefore:
step 51: judging whether two adjacent time elements appear in the same text clause, when the two adjacent time elements appear in the same text clause and the former time is earlier than the latter time, combining the standard issuing time corresponding to the two time elements to form an issuing time period, otherwise, executing the next step;
step 52: calculating the small time difference of two adjacent time elements, when the two adjacent time elements differ by less than 24 hours and the former time is earlier than the latter time, combining the standard issuing time corresponding to the two time elements to form an issuing time period, otherwise, executing the next step;
Step 53: searching keywords in a text clause, when keywords of start and start exist in the text clause corresponding to the previous time element in two adjacent time elements, keywords of end and end exist in the text clause corresponding to the next time element, and when the previous time is earlier than the next time, merging the standard case sending time corresponding to the two time elements to form a case sending time period, otherwise, executing the next step;
step 54: positioning standard issuing time corresponding to the remaining time elements as issuing time points;
step 55: marking the issuing time period and the issuing time point according to the time sequence; for example, it may be labeled "first time of occurrence", "second time of occurrence", "third time of occurrence", etc.
The case time of 7 days and 7 days later in the case of 2020 is the case time point, and the case time period is formed by combining 49 minutes of 25 days and 6 days in 2020 and 49 minutes in 25 days and 10 minutes in 7 months and 25 days and 8 hours in 2020, and can be indicated as a first case time of 2020072119 and a second case time of 2020072506-2020072508 after marking. The method not only can facilitate digital archiving, but also can avoid the time of the case from being memorized by the police after the case is marked, and the sequence of the case can be known by the police through the marked time of the case, so that the convenience of the police in processing the case is greatly improved, and the case can be rapidly and accurately analyzed by the police.
Example 2: the embodiment 2 of the invention provides a standard case issuing time extraction system of alert texts, which comprises a first module, a second module, a third module, a fourth module and a fifth module;
the first module is used for sequentially extracting time elements in the warning text in a named entity identification mode;
the second module is used for dividing the warning text into a plurality of text clauses and constructing key value pairs of the text clauses and the time elements;
the third module is used for establishing and training a time identification model, and identifying the expression content in the text clause through the time identification model so as to determine the time of the case;
the fourth module is used for carrying out standardization processing on the determined issuing time;
the fifth module is used for merging the standardized issuing time and further marking the merged issuing time;
the first module, the second module, the third module, the fourth module and the fifth module of the standard case sending time extraction system of the alert text are used for implementing the standard time extraction method of the alert text in embodiment 1, so that the standard time extraction method of the alert text has the technical effects that the standard case sending time extraction system of the alert text also has the following technical effects.
Example 3: embodiment 3 of the present invention provides a computer processing system including a memory module; the storage module stores a computer program for implementing the standard case time extraction method of the alert text according to any one of the embodiments; the computer processing system can be used for realizing the standard time extraction method of the alert text, so the standard time extraction method of the alert text has the technical effects, and the computer processing system also has the technical effects.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims (6)

1. The method for extracting the standard case sending time of the alert text is characterized by comprising the following steps of:
step 1: sequentially extracting time elements in the warning text in a named entity identification mode;
step 2: dividing the alert text into a plurality of text clauses, and constructing key value pairs of the text clauses and the time elements;
Step 3: establishing and training a case time identification model, and identifying the expression content in the text clause through the case time identification model to determine the case time;
step 4: carrying out standardization processing on the determined issuing time;
step 5: combining the standardized issuing time and further marking the combined issuing time; the step 1 adopts a regular expression to extract time elements, and the specific process is as follows:
step 11: firstly, removing the content in brackets in the warning text, and removing time element interference information in the bracket content;
step 12: then extracting time elements in the text by using a regular expression, wherein the regular expression is as follows: ([ 0-9] {4} years)? ([ 0-9] {1,2} month)? ([ 0-9] {1,2} day)? (before yesterday? [ \\ u4E00- \\u9FA5]? (night |early|morning|afternoon|evening)? [ \\ u4E00- \\u9FA5]? ([ 0-9] {0,2} [ time |Point ])? ([ 0-9] {0,2} points);
wherein:
([ 0-9] {4} years), representing four digits plus "year" for matching year time;
([ 0-9] {1,2} month), representing one or two digits plus "month" for matching month time;
([ 0-9] {1,2} day), representing one or two digits plus "day" for matching time of day;
(Jinyesterday) front) [ \\\u 4E00- \\u9FA5], to match the relative date descriptions of "today", "yesterday", and "previous day";
(night|early|morning|afternoon|evening) [ (u 4E00-_9FA5], period descriptions for matching "evening", "morning", "afternoon";
([ 0-9] {1,2} [ time|point ]), representing one or two digits plus "time" or "point" for matching a specific hour;
([ 0-9] {1,2} minutes), representing one or two digits plus "minutes" for matching a particular minute;
the step 2 is further as follows:
firstly, sequentially arranging the extracted time elements according to the sequence of the occurrence of the alarm text, and determining the first time as alarm time;
then, the warning text is segmented into a plurality of text clauses through punctuation regular matching;
finally, determining a text clause where a time element except the alarm time is located; if the text clause contains a time element and the left and right clauses of the text clause do not contain the time element, combining the left and right text clauses which do not contain the time element with the clause containing the time element to form a new text clause; constructing key value pairs of which the time elements are in one-to-one correspondence with the text clauses;
The pattern sending time identification model in the step 3 comprises a pre-training model and a judging model;
the training data in the database is derived from historical warning condition data of manual marking of the case sending time, and the case sending time in the warning condition text is determined by comparing text clauses containing time elements with the training data; and automatically marking the text clause data after discrimination and supplementing the text clause data into a database;
the judging model comprises an input layer, a hidden layer and an output layer; the input layer is a text clause containing time elements for segmenting the warning text, and the node number is the number of the text clauses; the hidden layer is the data newly added into the database in the pre-training process and the original data in the database; the output layer is used for determining whether the time element in the text clause is the case time or not through comparison, and the number of nodes of the output layer is equal to the number of the text clauses to be judged; aiming at the situation that the judging process exceeds the data extension of the training database, the input text clauses are manually processed, the processed data are fed into the database, and the data of the hidden layer are gradually increased along with the increase of the training process;
The judging model carries out error calculation on the judging result:
wherein X is ij For a text clause sample containing time elements, P (X ij ) Q (X) is the probability that the time element in the text clause is the occurrence time ij ) Probability of non-concurrence time for time element in text clause, and P (X ij )+Q(X ij ) =1, m is the number of nodes of the hidden layer, N is the number of text clause samples containing time elements; the smaller the H (P, Q) value, the smaller the error representing the discrimination result.
2. The method for extracting standard case time of alert text according to claim 1, wherein the step 4 is further:
step 41: directly determining a time element of 'year, month, day and time' through a regular expression of '0-9 ] {4} year', '0-9 ] {1,2} month', '0-9 ] {1,2} day', '0-9 ] {0,2} [ time|point ]' and executing the next step;
step 42: if "night", "afternoon", "evening" appears in the time element text, and the number of hours through "[0-9] {0,2} [ hour |Point ]") is less than 12, the number of hours is increased by 12;
step 43: if the 'day' element in the time element is missing and the time element contains 'Jing', 'yesterday' and 'before', the corresponding 'day' element is obtained by reasoning 0 day, 1 day and 2 days forward according to the alarm time;
Step 44: if a single element in the time element 'year, month, day and hour' is missing, filling the corresponding element in the previous time element;
step 45: the time elements are normalized to form a standard time of issuance in a 10-digit format of "yyyymmddhh".
3. The method for extracting standard case time of alert text according to claim 1, wherein the step 5 is further:
step 51: judging whether two adjacent time elements appear in the same text clause, when the two adjacent time elements appear in the same text clause and the former time is earlier than the latter time, combining the standard issuing time corresponding to the two time elements to form an issuing time period, otherwise, executing the next step;
step 52: calculating the small time difference of two adjacent time elements, when the two adjacent time elements differ by less than 24 hours and the former time is earlier than the latter time, combining the standard issuing time corresponding to the two time elements to form an issuing time period, otherwise, executing the next step;
step 53: searching keywords in a text clause, when keywords of start and start exist in the text clause corresponding to the previous time element in two adjacent time elements, keywords of end and end exist in the text clause corresponding to the next time element, and when the previous time is earlier than the next time, merging the standard case sending time corresponding to the two time elements to form a case sending time period, otherwise, executing the next step;
Step 54: positioning standard issuing time corresponding to the remaining time elements as issuing time points;
step 55: and marking the issuing time period and the issuing time point according to the time sequence.
4. The utility model provides a standard case time extraction system of alert condition text which characterized in that includes:
the first module is used for sequentially extracting time elements in the warning text in a named entity identification mode;
the second module is used for segmenting the warning text into a plurality of text clauses and constructing key value pairs of the text clauses and the time elements;
a third module for establishing and training a time identification model, and identifying the expression content in the text clause through the time identification model to determine the time;
a fourth module for performing normalization processing on the determined issue time;
a fifth module for merging the standardized issuing time and further marking the merged issuing time; the first module extracts time elements by adopting a regular expression, firstly removes the content in brackets in the warning text, removes the time element interference information in the bracket content, and then extracts the time elements in the text by utilizing the regular expression, wherein the regular expression is as follows:
([ 0-9] {4} years)? ([ 0-9] {1,2} month)? ([ 0-9] {1,2} day)? (before yesterday? [ \\ u4E00- \\u9FA5]? (night |early|morning|afternoon|evening)? [ \\ u4E00- \\u9FA5]? ([ 0-9] {0,2} [ time |Point ])? ([ 0-9] {0,2} points);
wherein:
([ 0-9] {4} years), representing four digits plus "year" for matching year time;
([ 0-9] {1,2} month), representing one or two digits plus "month" for matching month time;
([ 0-9] {1,2} day), representing one or two digits plus "day" for matching time of day;
(Jinyesterday) front) [ \\\u 4E00- \\u9FA5], to match the relative date descriptions of "today", "yesterday", and "previous day";
(night|early|morning|afternoon|evening) [ (u 4E00-_9FA5], period descriptions for matching "evening", "morning", "afternoon";
([ 0-9] {1,2} [ time|point ]), representing one or two digits plus "time" or "point" for matching a specific hour;
([ 0-9] {1,2} minutes), representing one or two digits plus "minutes" for matching a particular minute;
the second module firstly sequentially arranges the extracted time elements according to the sequence of the occurrence in the warning text, and determines the first time as the warning time; then, the warning text is segmented into a plurality of text clauses through punctuation regular matching; finally, determining a text clause where a time element except the alarm time is located; if the text clause contains a time element and the left and right clauses of the text clause do not contain the time element, combining the left and right text clauses which do not contain the time element with the clause containing the time element to form a new text clause; constructing key value pairs of which the time elements are in one-to-one correspondence with the text clauses;
The third module establishes and trains a time identification model, wherein the time identification model comprises a pre-training model and a judging model;
firstly, a database is established in the pre-training model, training data in the database is derived from historical warning condition data of manually marked case-sending time, and the case-sending time in the warning condition text is determined by comparing text clauses containing time elements with the training data; and automatically marking the text clause data after discrimination and supplementing the text clause data into a database;
the judging model comprises an input layer, a hiding layer and an output layer; the input layer is a text clause containing time elements for segmenting the warning text, and the node number is the number of the text clauses; the hidden layer is the data newly added into the database in the pre-training process and the original data in the database; the output layer is used for determining whether the time element in the text clause is the case time or not through comparison, and the number of nodes of the output layer is equal to the number of the text clauses to be judged; aiming at the situation that the judging process exceeds the data extension of the training database, the input text clauses are manually processed, the processed data are fed into the database, and the data of the hidden layer are gradually increased along with the increase of the training process;
The judging model carries out error calculation on the judging result:
wherein X is ij For a text clause sample containing time elements, P (X ij ) Q (X) is the probability that the time element in the text clause is the occurrence time ij ) For dividing textProbability of non-occurrence time of time element in sentence, and P (X ij )+Q(X ij ) =1, m is the number of nodes of the hidden layer, N is the number of text clause samples containing time elements; the smaller the H (P, Q) value, the smaller the error representing the discrimination result.
5. The police text standard case time extraction system of claim 4, wherein,
the fourth module directly determines time elements of year, month, day and time through regular expressions of "[0-9] {4} year", "[0-9] {1,2} month", "[0-9] {1,2} day", "[0-9] {0,2} [ time|point ]"; "night", "afternoon", "evening" occurs in the time element text, and the number of hours through "[0-9] {0,2} [ hour |point ]" is less than 12, the number of hours is added to 12; when the 'day' element in the time element is missing and the time element contains 'Jing', 'yesterday' and 'before', the time element is inferred to be 0 day, 1 day and 2 days forward according to the alarm time, and the corresponding 'day' element is obtained; when a single element in the time element 'year, month, day and hour' is missing, filling corresponding elements in the previous time element; finally, the time elements are standardized to form the standard issuing time of the 10-bit digital format of yyyymmddhh;
The fifth module firstly judges whether two adjacent time elements appear in the same text clause, and when the two adjacent time elements appear in the same text clause and the former time is earlier than the latter time, the standard case sending time corresponding to the two time elements is combined to form a case sending time period; calculating the small time difference of two adjacent time elements, and when the two adjacent time elements differ by less than 24 hours and the former time is earlier than the latter time, combining the standard issuing time corresponding to the two time elements to form an issuing time period; searching keywords in a text clause, and when keywords of 'start', 'start' exist in the text clause corresponding to the previous time element in two adjacent time elements, keywords of 'end', 'end' exist in the text clause corresponding to the next time element, and the previous time is earlier than the next time, merging the standard case sending time corresponding to the two time elements to form a case sending time period; positioning standard issuing time corresponding to the remaining time elements as issuing time points; and finally, marking the issuing time period and the issuing time point according to the time sequence.
6. A computer processing system comprising a storage module, wherein the storage module stores a computer program for implementing the standard case time extraction method of alert text according to any one of claims 1 to 3.
CN202011195667.3A 2020-10-30 2020-10-30 Standard case sending time extraction method and system for alert text Active CN112541075B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011195667.3A CN112541075B (en) 2020-10-30 2020-10-30 Standard case sending time extraction method and system for alert text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011195667.3A CN112541075B (en) 2020-10-30 2020-10-30 Standard case sending time extraction method and system for alert text

Publications (2)

Publication Number Publication Date
CN112541075A CN112541075A (en) 2021-03-23
CN112541075B true CN112541075B (en) 2024-04-05

Family

ID=75013660

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011195667.3A Active CN112541075B (en) 2020-10-30 2020-10-30 Standard case sending time extraction method and system for alert text

Country Status (1)

Country Link
CN (1) CN112541075B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116108163B (en) * 2023-04-04 2023-06-27 之江实验室 Text matching method, device, equipment and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108305050A (en) * 2018-02-08 2018-07-20 贵州小爱机器人科技有限公司 Information of reporting a case to the security authorities and the extracting method of service requirement information, device, equipment and medium
CN108920461A (en) * 2018-06-26 2018-11-30 武大吉奥信息技术有限公司 A kind of polymorphic type and entity abstracting method and device containing complex relationship
CN109472419A (en) * 2018-11-16 2019-03-15 中山大学 Method for building up, device and the storage medium of alert prediction model based on space-time
CN110287292A (en) * 2019-07-04 2019-09-27 科大讯飞股份有限公司 A kind of judge's measurement of penalty irrelevance prediction technique and device
CN110941702A (en) * 2019-11-26 2020-03-31 北京明略软件***有限公司 Retrieval method and device for laws and regulations and laws and readable storage medium
CN110990562A (en) * 2019-10-29 2020-04-10 新智认知数字科技股份有限公司 Alarm classification method and system
CN111047092A (en) * 2019-12-11 2020-04-21 深圳前海环融联易信息科技服务有限公司 Dispute case victory rate prediction method and device, computer equipment and storage medium
CN111062834A (en) * 2019-12-11 2020-04-24 深圳前海环融联易信息科技服务有限公司 Dispute case entity identification method and device, computer equipment and storage medium
CN111260223A (en) * 2020-01-17 2020-06-09 山东省计算中心(国家超级计算济南中心) Intelligent identification and early warning method, system, medium and equipment for trial and judgment risk
WO2020114373A1 (en) * 2018-12-07 2020-06-11 北京国双科技有限公司 Method and apparatus for realizing element recognition in judicial document
CN111274804A (en) * 2020-01-17 2020-06-12 珠海市新德汇信息技术有限公司 Case information extraction method based on named entity recognition
CN111680512A (en) * 2020-05-11 2020-09-18 上海阿尔卡特网络支援***有限公司 Named entity recognition model, telephone exchange switching extension method and system
CN111783420A (en) * 2020-06-19 2020-10-16 上海交通大学 Anti-complaint book element extraction method, system, medium and device based on BERT model

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10489439B2 (en) * 2016-04-14 2019-11-26 Xerox Corporation System and method for entity extraction from semi-structured text documents
US10957171B2 (en) * 2016-07-11 2021-03-23 Google Llc Methods and systems for providing event alerts
US11170016B2 (en) * 2017-07-29 2021-11-09 Splunk Inc. Navigating hierarchical components based on an expansion recommendation machine learning model
US20200126174A1 (en) * 2018-08-10 2020-04-23 Rapidsos, Inc. Social media analytics for emergency management

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108305050A (en) * 2018-02-08 2018-07-20 贵州小爱机器人科技有限公司 Information of reporting a case to the security authorities and the extracting method of service requirement information, device, equipment and medium
CN108920461A (en) * 2018-06-26 2018-11-30 武大吉奥信息技术有限公司 A kind of polymorphic type and entity abstracting method and device containing complex relationship
CN109472419A (en) * 2018-11-16 2019-03-15 中山大学 Method for building up, device and the storage medium of alert prediction model based on space-time
WO2020114373A1 (en) * 2018-12-07 2020-06-11 北京国双科技有限公司 Method and apparatus for realizing element recognition in judicial document
CN110287292A (en) * 2019-07-04 2019-09-27 科大讯飞股份有限公司 A kind of judge's measurement of penalty irrelevance prediction technique and device
CN110990562A (en) * 2019-10-29 2020-04-10 新智认知数字科技股份有限公司 Alarm classification method and system
CN110941702A (en) * 2019-11-26 2020-03-31 北京明略软件***有限公司 Retrieval method and device for laws and regulations and laws and readable storage medium
CN111047092A (en) * 2019-12-11 2020-04-21 深圳前海环融联易信息科技服务有限公司 Dispute case victory rate prediction method and device, computer equipment and storage medium
CN111062834A (en) * 2019-12-11 2020-04-24 深圳前海环融联易信息科技服务有限公司 Dispute case entity identification method and device, computer equipment and storage medium
CN111260223A (en) * 2020-01-17 2020-06-09 山东省计算中心(国家超级计算济南中心) Intelligent identification and early warning method, system, medium and equipment for trial and judgment risk
CN111274804A (en) * 2020-01-17 2020-06-12 珠海市新德汇信息技术有限公司 Case information extraction method based on named entity recognition
CN111680512A (en) * 2020-05-11 2020-09-18 上海阿尔卡特网络支援***有限公司 Named entity recognition model, telephone exchange switching extension method and system
CN111783420A (en) * 2020-06-19 2020-10-16 上海交通大学 Anti-complaint book element extraction method, system, medium and device based on BERT model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"A Survey on Deep Learning for Named Entity Recognition";Jing Li;《IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING》;20201030;第50-70页 *
"基于情景相似度的突发事件情报感知实现方法";杨峰 等;《情报学报》;20190531;第525-533页 *

Also Published As

Publication number Publication date
CN112541075A (en) 2021-03-23

Similar Documents

Publication Publication Date Title
CN111259160B (en) Knowledge graph construction method, device, equipment and storage medium
CN113191148B (en) Rail transit entity identification method based on semi-supervised learning and clustering
CN110175334B (en) Text knowledge extraction system and method based on custom knowledge slot structure
CN110609998A (en) Data extraction method of electronic document information, electronic equipment and storage medium
CN111061882A (en) Knowledge graph construction method
CN114090736A (en) Enterprise industry identification system and method based on text similarity
CN115168345B (en) Database classification method, system, device and storage medium
CN111859070A (en) Mass internet news cleaning system
CN112069383A (en) News text event and time extraction and normalization system for event tracking
CN112541075B (en) Standard case sending time extraction method and system for alert text
CN114860882A (en) Fair competition review auxiliary method based on text classification model
CN115964476A (en) Intelligent key information extraction method for securitized product report
CN112328792A (en) Optimization method for recognizing credit events based on DBSCAN clustering algorithm
CN111177401A (en) Power grid free text knowledge extraction method
CN113515587B (en) Target information extraction method, device, computer equipment and storage medium
CN112990110B (en) Method for extracting key information from research report and related equipment
CN109960707B (en) College recruitment data acquisition method and system based on artificial intelligence
CN113111660A (en) Data processing method, device, equipment and storage medium
CN111859032A (en) Method and device for detecting character-breaking sensitive words of short message and computer storage medium
CN109542845B (en) Text metadata extraction method based on keyword expression
CN110765107A (en) Question type identification method and system based on digital coding
CN115994531A (en) Multi-dimensional text comprehensive identification method
CN114298041A (en) Network security named entity identification method and identification device
CN112597763A (en) Method and device for extracting and displaying judicial literature information in association manner and storage medium
CN113270092A (en) Scheduling voice keyword extraction method based on LDA algorithm

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant