CN109508382B - Label labeling method and device and computer readable storage medium - Google Patents

Label labeling method and device and computer readable storage medium Download PDF

Info

Publication number
CN109508382B
CN109508382B CN201811221612.8A CN201811221612A CN109508382B CN 109508382 B CN109508382 B CN 109508382B CN 201811221612 A CN201811221612 A CN 201811221612A CN 109508382 B CN109508382 B CN 109508382B
Authority
CN
China
Prior art keywords
entity
label
text
recorded
labeled
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811221612.8A
Other languages
Chinese (zh)
Other versions
CN109508382A (en
Inventor
徐安华
张亚启
欧阳佑
路德龙
马瑞璇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Mininglamp Software System Co ltd
Original Assignee
Beijing Mininglamp Software System Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Mininglamp Software System Co ltd filed Critical Beijing Mininglamp Software System Co ltd
Priority to CN201811221612.8A priority Critical patent/CN109508382B/en
Publication of CN109508382A publication Critical patent/CN109508382A/en
Application granted granted Critical
Publication of CN109508382B publication Critical patent/CN109508382B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The application discloses a label labeling method and device and a computer readable storage medium, wherein the method comprises the following steps: detecting whether an entity in a text to be marked is a pre-recorded entity; if the entity is a pre-recorded entity, acquiring a tag type state chain corresponding to the pre-recorded entity, wherein the tag type state chain is used for storing a tagged tag type sequence; and automatically labeling the type of the label for the entity in the text to be labeled according to the label type state chain. According to the method and the device, the entity automatic labeling label type in the text to be labeled is treated according to the label type state chain corresponding to the entity recorded in advance, so that the label labeling efficiency is greatly improved, the repeated entity is effectively and automatically labeled, the label labeling efficiency is greatly improved, the operation amount of labeling personnel is reduced, and the friendliness to user use is greatly improved.

Description

Label labeling method and device and computer readable storage medium
Technical Field
The present invention relates to the field of Natural Language Processing (NLP) technology, and in particular, to a method and an apparatus for labeling labels, and a computer-readable storage medium.
Background
With the popularity of big data and Artificial Intelligence (AI), natural language processing related technologies will be used more and more in enterprise-level applications. Currently, many large companies provide hypertext Transfer Protocol (HTTP) services that are part-of-speech recognition, entity recognition, relationship recognition, and other models, but most of the natural language processing models behind these services are trained from internet data. While the sources of the text contents in the internet are wide: there are both content from professional media and content generated by netizens personally. Compared with the content of the text in the enterprise, the Internet text has larger difference between word use and writing style. Therefore, the natural language processing technology is expected to achieve a better effect in enterprise-level applications, and generally needs to retrain the text in the enterprise into a natural language processing model suitable for the needs of the enterprise itself after marking the text.
For the more important tasks in NLP: such as part-of-speech recognition, entity recognition, etc., all require tagging using textual data in the enterprise, followed by training of the model. In entity tagging, many tagged entities appear in large numbers in different text content. Therefore, when the labeled entity appears again, there is a high probability that the entity should be labeled under the current new text content. Learning tagging strategies from historical tags is necessary to simplify the extensive repetitive operations of the user, which raises the problem of assisting tag repetitive entities.
Disclosure of Invention
The embodiment of the invention provides a label labeling method and device and a computer readable storage medium, which can greatly improve the label labeling efficiency.
In order to solve the above technical problem, the technical solution of the embodiment of the present invention is implemented as follows:
the embodiment of the invention provides a label marking method, which comprises the following steps:
detecting whether an entity in a text to be marked is a pre-recorded entity;
if the entity is a pre-recorded entity, acquiring a tag type state chain corresponding to the pre-recorded entity, wherein the tag type state chain is used for storing a tagged tag type sequence; and automatically labeling the type of the label for the entity in the text to be labeled according to the label type state chain.
In an embodiment, assuming that the length of the tag type state chain is M, where M is a natural number, automatically labeling a tag type of an entity in a text to be labeled according to the tag type state chain includes:
counting the occurrence frequency N of the entity to be pre-recorded in the text to be marked, wherein N is a natural number;
if N is less than or equal to M, sequentially labeling N pre-recorded entities in the text to be labeled by using the first N label types in the label type state chain;
if N is larger than M, sequentially labeling the first M pre-recorded entities in the text to be labeled by using M label types in the label type state chain, and labeling the (M +1) th to the Nth pre-recorded entities in the text to be labeled by using the Mth label type in the label type state chain.
In an embodiment, the method further comprises, before:
and performing word segmentation on the text to be labeled according to the entity recorded in advance.
In an embodiment, when performing word segmentation on the text to be labeled, performing word segmentation by using a forward maximum matching algorithm, where the forward maximum matching algorithm specifically is: and taking the entity recorded in advance as a word segmentation dictionary, and taking the continuous character which is longest matched with the word segmentation dictionary in the text to be labeled as the selected word segmentation.
In an embodiment, the method further comprises:
detecting whether an entity in the text to be labeled updates a label type and whether the entity updating the label type is the entity recorded in advance;
if the entity in the text to be labeled updates the label type and the entity updating the label type is not a pre-recorded entity, recording the entity and a label type state chain corresponding to the entity;
and if the entity in the text to be labeled updates the tag type and the entity updating the tag type is a pre-recorded entity, correspondingly modifying the tag type state chain corresponding to the entity according to the updated tag type.
In an embodiment, when no tag type is labeled for the ith record of an entity in the text to be labeled, the ith tag type in the tag type state chain corresponding to the entity of the record is null, where i is a natural number.
Embodiments of the present invention also provide a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps of the label labeling method as described in any one of the above.
The embodiment of the invention also provides a label labeling device, which comprises a processor and a memory, wherein:
the processor is configured to execute a tag labeling program stored in the memory to implement the steps of the tag labeling method according to any one of the above.
The embodiment of the invention also provides a label labeling device, which comprises a storage module, a detection module and an automatic labeling module, wherein:
the storage module is used for storing a pre-recorded entity and a tag type state chain corresponding to the entity, wherein the tag type state chain is used for storing a tagged tag type sequence;
the detection module is used for detecting whether the entity in the text to be labeled is the entity pre-recorded in the storage module, and if the entity is the pre-recorded entity, the detection module informs the automatic labeling module;
and the automatic labeling module is used for receiving the notification of the detection module, acquiring a label type state chain corresponding to the entity recorded in the storage module in advance, and automatically labeling the label type of the entity in the text to be labeled according to the label type state chain.
In an embodiment, the label labeling apparatus further includes a recording module, wherein:
the detection module is further configured to detect whether an entity in the text to be labeled updates the tag type and whether the entity updating the tag type is the pre-recorded entity, and send a first notification to the recording module if the entity in the text to be labeled updates the tag type and the entity updating the tag type is not the pre-recorded entity of the storage module; if the entity in the text to be labeled updates the label type and the entity updating the label type is the entity recorded in advance by the storage module, sending a second notification to the recording module;
the recording module is used for receiving the first notice of the detection module and recording the entity and the corresponding label type state chain thereof to the storage module; and receiving a second notification of the detection module, and correspondingly modifying the tag type state chain corresponding to the entity recorded in the storage module according to the updated tag type.
The technical scheme of the embodiment of the invention has the following beneficial effects:
according to the label labeling method and device and the computer-readable storage medium provided by the embodiment of the invention, the label type of the entity in the text to be labeled is automatically labeled according to the label type state chain corresponding to the entity recorded in advance, so that the label labeling efficiency is greatly improved, repeated entities are effectively and automatically labeled, the label labeling efficiency is greatly improved, the operation amount of labeling personnel is reduced, and the user friendliness is greatly improved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
fig. 1 is a schematic flow chart of a label labeling method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a labeled text structure according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a partial tag type status chain of the text record in FIG. 2;
FIG. 4 is a schematic diagram of another labeled text structure according to an embodiment of the present invention;
FIG. 5 is a structural diagram of a tag type status chain of the text record in FIG. 4;
FIG. 6 is a schematic structural diagram of a label labeling apparatus according to an embodiment of the present invention;
FIG. 7 is a schematic structural diagram of another labeling apparatus according to an embodiment of the present invention;
fig. 8 is a schematic structural diagram of another label labeling apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that the embodiments and features of the embodiments in the present application may be arbitrarily combined with each other without conflict.
Natural language processing is a general term for a large class of problems of processing, converting, and extracting information for data such as speech and text. Entities, where emphasis refers to Named Entity Recognition (NER) in the field of natural language processing, but is not limited to Named entities. Relationships, the emphasis here refers to entities and relationships between entities in the natural language processing domain. And (4) entity identification, namely extracting entities with certain semantic information, such as names, dates, places, organizations and the like from the input text. Relationship recognition extracts entities and relationships between entities with certain semantic information, such as parent and child, employment, geography, and relationship, from input text. Training, which refers to a process of updating model parameters by a machine according to training data and a loss function in the field of machine learning. Chinese Word Segmentation (CWS) refers to segmenting a Chinese character sequence into a single Word. Word segmentation is a process of recombining continuous word sequences into word sequences according to a certain specification.
As shown in fig. 1, a label labeling method according to an embodiment of the present invention includes the following steps:
step 101: detecting whether an entity in a text to be marked is a pre-recorded entity;
illustratively, a tagged text structure is shown in fig. 2, and a part of recorded entities and tag type status chains corresponding to the recorded entities are shown in fig. 3, where the recorded entities include "tesla", "radio-controlled ship", "radio-controlled technology", "radio communication", and the like, and each recorded entity corresponds to a respective tag type status chain.
In an embodiment of the present invention, the method further includes, before the step of:
and performing word segmentation on the text to be labeled according to the entity recorded in advance.
Before word segmentation is carried out on the text to be labeled, a word list is compiled in advance, and each word in the word list is the entity recorded in advance; and when the text to be labeled is segmented, selecting all words in a word list one by one in the text to be labeled.
In an embodiment of the present invention, when performing word segmentation on the text to be labeled, a forward maximum matching algorithm is used for performing word segmentation, and the forward maximum matching algorithm specifically includes: and taking the entity recorded in advance as a word segmentation dictionary, and taking the continuous character which is longest matched with the word segmentation dictionary in the text to be labeled as the selected word segmentation.
Illustratively, in fig. 2, "he manufactured the first radio-controlled ship in the world and patented the radio remote control technology", although the term "radio" is already in the vocabulary, the chosen participles are "radio-controlled ship" and "radio remote control technology" since the latter words can be combined with "radio" to form the longer words "radio-controlled ship" and "radio remote control technology" (i.e., forward maximum matching algorithm).
Step 102: if the entity is a pre-recorded entity, acquiring a tag type state chain corresponding to the pre-recorded entity, wherein the tag type state chain is used for storing a tagged tag type sequence; and automatically labeling the type of the label for the entity in the text to be labeled according to the label type state chain.
In an embodiment of the present invention, assuming that the length of the tag type state chain is M, where M is a natural number, the automatically labeling a tag type of an entity in a text to be labeled according to the tag type state chain in step 102 includes:
counting the occurrence frequency N of the entity to be pre-recorded in the text to be marked, wherein N is a natural number;
if N is less than or equal to M, sequentially labeling N pre-recorded entities in the text to be labeled by using the first N label types in the label type state chain;
if N is larger than M, sequentially labeling the first M pre-recorded entities in the text to be labeled by using M label types in the label type state chain, and labeling the (M +1) th to the Nth pre-recorded entities in the text to be labeled by using the Mth label type in the label type state chain.
For example, as shown in fig. 4, in the labeled text, a tag type state chain corresponding to the recorded entity "watt" is shown in fig. 5, when labeling the next text to be labeled, the tag type corresponding to the first entity "watt" is labeled as a Unit (Unit), the tag type corresponding to the second entity "watt" is labeled as a Company (Company), and the tag types corresponding to the third and subsequent entities "watt" are labeled as names (names).
In an embodiment of the present invention, the method further includes:
detecting whether an entity in the text to be labeled updates a label type and whether the entity updating the label type is the entity recorded in advance;
if the entity in the text to be labeled updates the label type and the entity updating the label type is not a pre-recorded entity, recording the entity and a label type state chain corresponding to the entity;
and if the entity in the text to be labeled updates the tag type and the entity updating the tag type is a pre-recorded entity, correspondingly modifying the tag type state chain corresponding to the entity according to the updated tag type.
When the label marking method of the embodiment of the invention is used, if the recorded state of the existing label type state chain of the entity to be marked currently exists, the original recorded state is updated or covered (when the length of the existing label type state chain is less than or equal to that of the new label type state chain, the existing label type state chain is covered by the new label type state chain, and when the length of the existing label type state chain is greater than that of the new label type state chain, the recorded state of the front part of the existing label type state chain, which is as long as the new label type state chain, is updated by the new label type state chain); and if the record state of any label type state chain does not exist, creating a new record.
Exemplarily, it is assumed that Mention is an array of entities to be labeled in a text, Mention [ j ] is the jth entity to be labeled, and the entity to be labeled is generally a character string in the text; tag is a Tag type array, Tag [ k ] is the name of the kth Tag type, the Tag type can be name, address, date, company, invention, unit, etc., each occurrence of the Mention [ j ] in the text corresponds to one Tag [ k ], wherein the Tag [ k ] is possibly empty; the Position is the Position of the entity to be labeled, for example, the kth Position of the jth entity to be labeled appearing in the text, and may be recorded as Position [ j ] [ k ]; the MentionTag [ j ] [ k ] represents the label type of the kth position of the jth entity to be labeled in the text, wherein j and k are integers which are larger than or equal to 0; the process of updating the original recording state and the new recording is as follows:
for all positions Position [ j ] [ k ] where the Mention [ j ] appears, each state MentionTag [ j ] [ k ] is obtained, if it is not marked, MentionTag [ j ] [ k ] ═ None, if it is marked as Tag [ s ] or modified as Tag [ s ], where s is an integer greater than or equal to 0, MentionTag [ j ] [ k ] ═ Tag [ s ], and for entity "watt" in fig. 4, its recorded entity, Tag type, and Tag type state chain is as follows:
mention [0] ═ watt; position [0] [ k ] (k is 0 to 6); tag [0] ═ Unit' (Unit); tag [1] ═ Company'; tag [2] ═ Name'; the term "Tag" refers to a "Tag [2] or a" Tag [0] or a "Tag [1] or a" Tag [2] or a "Tag [3] or a" Tag [2] or a "Tag [0] [5] or a" Tag [2] or a "Tag [6] or a" Tag [2 ].
When the currently recorded tag type state chain is applied to a text to be labeled, all recorded entity names Mention [ j ] are found out first, at this time, all recorded Mention [ s ] can be used for carrying out forward maximum matching on full-text unmarked contents (the marked contents can be null or other high-priority tag type state chain application results), the positions [ j ] [ k ] where all Mention [ j ] appear are found out, and then the contents stored in the currently recorded tag type state chain MentionTag [ j ] [ k ] are used for labeling.
When the MentionTag [ j ] [ k ] is applied to each Mention [ j ] in the text, if the occurrence frequency k1 of the Mention [ j ] in the text is less than or equal to the length k2 of the label type state chain, directly taking the first k1 label types in the label type state chain to automatically label the Mention [ j ] [0] to the Mention [ j ] [ k1-1] in the text in sequence; if the occurrence frequency k1 of the Mention [ j ] in the text is larger than the length k2 of the tag type state chain, using k2 tag types in the tag type state chain to automatically label the Mention [ j ] [0] to the Mention [ j ] [ k2-1] in the text in sequence, and using the kth 2 tag type Mention tag [ j ] [ k2-1] in the text to automatically label the Mention [ j ] [ k2] to the Mention [ j ] [ k1-1], wherein k1 and k2 are natural numbers.
In an embodiment of the present invention, when an ith record of an entity in the text to be labeled is not labeled with a tag type (the tag type of the entity may be deleted or may not be labeled all the time), an ith tag type in the tag type state chain corresponding to the entity of the record is null, where i is a natural number.
Embodiments of the present invention also provide a computer-readable storage medium storing one or more programs, which are executable by one or more processors to implement the steps of the label labeling method as described in any one of the above.
The embodiment of the invention also provides a label labeling device, which comprises a processor and a memory, wherein:
the processor is configured to execute a tag labeling program stored in the memory to implement the steps of the tag labeling method according to any one of the above.
As shown in fig. 6, a label labeling apparatus according to an embodiment of the present invention includes a storage module 601, a detection module 602, and an automatic labeling module 603, wherein:
a storage module 601, configured to store a pre-recorded entity and a tag type state chain corresponding to the entity, where the tag type state chain is used to store a tagged tag type sequence;
a detecting module 602, configured to detect whether an entity in the text to be labeled is a pre-recorded entity in the storage module 601, and if the entity is the pre-recorded entity, notify the automatic labeling module 603;
the automatic labeling module 603 is configured to receive the notification from the detecting module 602, acquire a tag type state chain corresponding to the entity pre-recorded in the storage module 601, and automatically label the tag type of the entity in the text to be labeled according to the tag type state chain.
Illustratively, a tagged text structure is shown in fig. 2, and a part of recorded entities and tag type status chains corresponding to the recorded entities are shown in fig. 3, where the recorded entities include "tesla", "radio-controlled ship", "radio-controlled technology", "radio communication", and the like, and each recorded entity corresponds to a respective tag type status chain.
In an embodiment of the present invention, assuming that the length of the tag type state chain is M, where M is a natural number, the automatically labeling module 603, according to the tag type state chain, automatically labels a tag type of an entity in a text to be labeled, including:
counting the occurrence frequency N of the entity to be pre-recorded in the text to be marked, wherein N is a natural number;
if N is less than or equal to M, sequentially labeling N pre-recorded entities in the text to be labeled by using the first N label types in the label type state chain;
if N is larger than M, sequentially labeling the first M pre-recorded entities in the text to be labeled by using M label types in the label type state chain, and labeling the (M +1) th to the Nth pre-recorded entities in the text to be labeled by using the Mth label type in the label type state chain.
For example, as shown in fig. 4, in the labeled text, a tag type status chain corresponding to the recorded entity "watt" is shown in fig. 5, when the automatic labeling module 603 labels the next text to be labeled, the tag type corresponding to the first entity "watt" is labeled as a Unit (Unit), the tag type corresponding to the second entity "watt" is labeled as a Company (Company), and the tag types corresponding to the third and subsequent entities "watt" are labeled as names (names).
In an embodiment of the present invention, as shown in fig. 7, the label labeling apparatus further includes a word segmentation module 604, wherein:
the word segmentation module 604 is configured to perform word segmentation on the text to be labeled according to an entity pre-recorded in the storage module 601.
The word segmentation module 604 pre-compiles a word list, each word in the word list is the pre-recorded entity, and each word in the word list is selected one by one in the text to be labeled according to the word list.
In this embodiment, when performing word segmentation on the text to be labeled, the word segmentation module 604 performs word segmentation by using a forward maximum matching algorithm, where the forward maximum matching algorithm specifically is: and taking the entity recorded in advance as a word segmentation dictionary, and taking the continuous character which is longest matched with the word segmentation dictionary in the text to be labeled as the selected word segmentation.
Illustratively, in fig. 2, "he manufactured the first radio-controlled ship in the world and patented the radio remote control technology", although the term "radio" is already included in the vocabulary, the participles selected by the participle module 604 are "radio-controlled ship" and "radio remote control technology" because the following words can be combined with "radio" to form the longer words "radio-controlled ship" and "radio remote control technology" (i.e., forward maximum matching algorithm).
In an embodiment of the present invention, as shown in fig. 8, the label labeling apparatus further includes a recording module 605, wherein:
the detecting module 602 is further configured to detect whether an entity in the text to be labeled updates a tag type and whether the entity updating the tag type is the pre-recorded entity, and send a first notification to the recording module 605 if the entity in the text to be labeled updates the tag type and the entity updating the tag type is not the pre-recorded entity of the storing module 601; if the entity in the text to be labeled updates the tag type and the entity updating the tag type is the entity pre-recorded by the storage module 601, sending a second notification to the recording module 605;
a recording module 605, configured to receive the first notification from the detecting module 602, and record the entity and the tag type state chain corresponding to the entity to the storing module 601; receiving a second notification from the detection module 602, and according to the updated tag type, correspondingly modifying the tag type state chain corresponding to the entity recorded in the storage module 601.
When the tag labeling apparatus of the embodiment of the present invention is used, if the currently labeled entity has a recorded state, the recording module 605 updates or covers the original recorded state (when the length of the existing tag type state chain is less than or equal to the length of the new tag type state chain, the existing tag type state chain is covered by the new tag type state chain; when the length of the existing tag type state chain is greater than the length of the new tag type state chain, the new tag type state chain is used to update the recorded state of the front part of the existing tag type state chain, which is as long as the new tag type state chain); if the recording status does not exist, the recording module 605 creates a new record.
Exemplarily, it is assumed that Mention is an array of entities to be labeled in a text, Mention [ j ] is the jth entity to be labeled, and the entity to be labeled is generally a character string in the text; tag is a Tag type array, Tag [ k ] is the name of the kth Tag type, the Tag type can be name, address, date, company, invention, unit, etc., each occurrence of the Mention [ j ] in the text corresponds to one Tag [ k ], wherein the Tag [ k ] is possibly empty; the Position is the Position of the entity to be labeled, for example, the kth Position of the jth entity to be labeled appearing in the text, and may be recorded as Position [ j ] [ k ]; the MentionTag [ j ] [ k ] represents the label type of the kth position of the jth entity to be labeled in the text, wherein j and k are integers which are larger than or equal to 0; the process of the recording module 605 updating the original recording status and the new recording is as follows:
for all positions Position [ j ] [ k ] where the Mention [ j ] appears, each state MentionTag [ j ] [ k ] is obtained, if it is not marked, MentionTag [ j ] [ k ] ═ None, if it is marked as Tag [ s ] or modified as Tag [ s ], where s is an integer greater than or equal to 0, MentionTag [ j ] [ k ] ═ Tag [ s ], and for entity "watt" in fig. 4, its recorded entity, Tag type, and Tag type state chain is as follows:
mention [0] ═ watt; position [0] [ k ] (k is 0 to 6); tag [0] ═ Unit' (Unit); tag [1] ═ Company'; tag [2] ═ Name'; the term "Tag" refers to a "Tag [2] or a" Tag [0] or a "Tag [1] or a" Tag [2] or a "Tag [3] or a" Tag [2] or a "Tag [0] [5] or a" Tag [2] or a "Tag [6] or a" Tag [2 ].
When the currently recorded tag type state chain is applied to a text to be labeled, all recorded entity names Mention [ j ] are found out first, at this time, all recorded Mention [ s ] can be used for carrying out forward maximum matching on full-text unmarked contents (the marked contents can be null or other high-priority tag type state chain application results), the positions [ j ] [ k ] where all Mention [ j ] appear are found out, and then the contents stored in the currently recorded tag type state chain MentionTag [ j ] [ k ] are used for labeling.
When the MentionTag [ j ] [ k ] is applied to each Mention [ j ] in the text, if the occurrence frequency k1 of the Mention [ j ] in the text is less than or equal to the length k2 of the label type state chain, directly taking the first k1 label types in the label type state chain to automatically label the Mention [ j ] [0] to the Mention [ j ] [ k1-1] in the text in sequence; if the occurrence frequency k1 of the Mention [ j ] in the text is larger than the length k2 of the tag type state chain, using k2 tag types in the tag type state chain to automatically label the Mention [ j ] [0] to the Mention [ j ] [ k2-1] in the text in sequence, and using the kth 2 tag type Mention tag [ j ] [ k2-1] in the text to automatically label the Mention [ j ] [ k2] to the Mention [ j ] [ k1-1], wherein k1 and k2 are natural numbers.
In an embodiment of the present invention, when the ith record of an entity in the text to be labeled does not label a tag type (the tag type of the entity may be deleted or may not be labeled all the time), the ith tag type in the tag type state chain corresponding to the entity recorded by the recording module 605 is empty, where i is a natural number.
It will be understood by those skilled in the art that all or part of the steps of the above methods may be implemented by instructing the relevant hardware through a program, and the program may be stored in a computer readable storage medium, such as a read-only memory, a magnetic or optical disk, and the like. Alternatively, all or part of the steps of the above embodiments may be implemented using one or more integrated circuits. Accordingly, each module/unit in the above embodiments may be implemented in the form of hardware, and may also be implemented in the form of a software functional module. The present invention is not limited to any specific form of combination of hardware and software.
The foregoing is only a preferred embodiment of the present invention, and naturally there are many other embodiments of the present invention, and those skilled in the art can make various corresponding changes and modifications according to the present invention without departing from the spirit and the essence of the present invention, and these corresponding changes and modifications should fall within the scope of the appended claims.

Claims (10)

1. A label labeling method is characterized by comprising the following steps:
detecting whether an entity in a text to be marked is a pre-recorded entity;
if the entity is a pre-recorded entity, acquiring a tag type state chain corresponding to the pre-recorded entity, wherein the tag type state chain is used for storing a tagged tag type sequence; and automatically labeling the type of the label for the entity in the text to be labeled according to the label type state chain.
2. The method according to claim 1, wherein assuming that the length of the tag type state chain is M, where M is a natural number, the automatically labeling a tag type for an entity in a text to be labeled according to the tag type state chain comprises:
counting the occurrence frequency N of the entity to be pre-recorded in the text to be marked, wherein N is a natural number;
if N is less than or equal to M, sequentially labeling N pre-recorded entities in the text to be labeled by using the first N label types in the label type state chain;
if N is larger than M, sequentially labeling the first M pre-recorded entities in the text to be labeled by using M label types in the label type state chain, and labeling the (M +1) th to the Nth pre-recorded entities in the text to be labeled by using the Mth label type in the label type state chain.
3. The method of claim 1, further comprising, prior to the method:
and performing word segmentation on the text to be labeled according to the entity recorded in advance.
4. The method according to claim 3, wherein when segmenting the text to be labeled, a forward maximum matching algorithm is used for segmenting the text, and the forward maximum matching algorithm specifically comprises: and taking the entity recorded in advance as a word segmentation dictionary, and taking the continuous character which is longest matched with the word segmentation dictionary in the text to be labeled as the selected word segmentation.
5. The method of claim 1, further comprising:
detecting whether an entity in the text to be labeled updates a label type and whether the entity updating the label type is the entity recorded in advance;
if the entity in the text to be labeled updates the label type and the entity updating the label type is not a pre-recorded entity, recording the entity and a label type state chain corresponding to the entity;
and if the entity in the text to be labeled updates the label type and the entity updating the label type is a pre-recorded entity, correspondingly modifying the label type state chain corresponding to the entity according to the updated label type.
6. The method according to claim 5, wherein when no tag type is tagged to the ith record of an entity in the text to be tagged, the ith tag type in the tag type state chain corresponding to the entity of the record is null, where i is a natural number.
7. A computer readable storage medium, characterized in that the computer readable storage medium stores one or more programs which are executable by one or more processors to implement the steps of the label labeling method as claimed in any one of claims 1 to 6.
8. A label labeling apparatus, comprising a processor and a memory, wherein:
the processor is configured to execute a tag labeling program stored in the memory to implement the steps of the tag labeling method according to any one of claims 1 to 6.
9. The label labeling device is characterized by comprising a storage module, a detection module and an automatic labeling module, wherein:
the storage module is used for storing a pre-recorded entity and a tag type state chain corresponding to the entity, wherein the tag type state chain is used for storing a tagged tag type sequence;
the detection module is used for detecting whether the entity in the text to be labeled is the entity pre-recorded in the storage module, and if the entity is the pre-recorded entity, the detection module informs the automatic labeling module;
and the automatic labeling module is used for receiving the notification of the detection module, acquiring a label type state chain corresponding to the entity recorded in the storage module in advance, and automatically labeling the label type of the entity in the text to be labeled according to the label type state chain.
10. The label labeling apparatus of claim 9, further comprising a recording module, wherein:
the detection module is further configured to detect whether an entity in the text to be labeled updates the tag type and whether the entity updating the tag type is the pre-recorded entity, and send a first notification to the recording module if the entity in the text to be labeled updates the tag type and the entity updating the tag type is not the pre-recorded entity of the storage module; if the entity in the text to be labeled updates the label type and the entity updating the label type is the entity recorded in advance by the storage module, sending a second notification to the recording module;
the recording module is used for receiving the first notice of the detection module and recording the entity and the corresponding label type state chain thereof to the storage module; and receiving a second notification of the detection module, and correspondingly modifying the tag type state chain corresponding to the entity recorded in the storage module according to the updated tag type.
CN201811221612.8A 2018-10-19 2018-10-19 Label labeling method and device and computer readable storage medium Active CN109508382B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811221612.8A CN109508382B (en) 2018-10-19 2018-10-19 Label labeling method and device and computer readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811221612.8A CN109508382B (en) 2018-10-19 2018-10-19 Label labeling method and device and computer readable storage medium

Publications (2)

Publication Number Publication Date
CN109508382A CN109508382A (en) 2019-03-22
CN109508382B true CN109508382B (en) 2020-08-21

Family

ID=65746753

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811221612.8A Active CN109508382B (en) 2018-10-19 2018-10-19 Label labeling method and device and computer readable storage medium

Country Status (1)

Country Link
CN (1) CN109508382B (en)

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2006309559A (en) * 2005-04-28 2006-11-09 Dainippon Printing Co Ltd Web browser equipped with bookmark function, bookmark management method and bookmark management program
CN101799802B (en) * 2009-02-05 2014-04-23 日电(中国)有限公司 Method and system for extracting entity relationship by using structural information
US9244909B2 (en) * 2012-12-10 2016-01-26 General Electric Company System and method for extracting ontological information from a body of text
CN106021229B (en) * 2016-05-19 2018-11-02 苏州大学 A kind of Chinese event synchronous anomalies method
CN106156286B (en) * 2016-06-24 2019-09-17 广东工业大学 Type extraction system and method towards technical literature knowledge entity
CN107622050B (en) * 2017-09-14 2021-02-26 武汉烽火普天信息技术有限公司 Bi-LSTM and CRF-based text sequence labeling system and method
CN107992597B (en) * 2017-12-13 2020-08-18 国网山东省电力公司电力科学研究院 Text structuring method for power grid fault case
CN108491373B (en) * 2018-02-01 2022-05-27 北京百度网讯科技有限公司 Entity identification method and system
CN108647319B (en) * 2018-05-10 2021-07-06 思派(北京)网络科技有限公司 Labeling system and method based on short text clustering

Also Published As

Publication number Publication date
CN109508382A (en) 2019-03-22

Similar Documents

Publication Publication Date Title
CN111090987B (en) Method and apparatus for outputting information
CN107679039B (en) Method and device for determining statement intention
Jugran et al. Extractive automatic text summarization using SpaCy in Python & NLP
US20210374347A1 (en) Few-shot named-entity recognition
US9477963B2 (en) Method and apparatus for automatically structuring free form heterogeneous data
CN105117387B (en) A kind of intelligent robot interactive system
GB2432448A (en) Method and system for word sequence processing
CN108897869B (en) Corpus labeling method, apparatus, device and storage medium
CN113821605B (en) Event extraction method
CN111046656A (en) Text processing method and device, electronic equipment and readable storage medium
CN113128227A (en) Entity extraction method and device
CN116108857B (en) Information extraction method, device, electronic equipment and storage medium
CN111143571A (en) Entity labeling model training method, entity labeling method and device
CN111782793A (en) Intelligent customer service processing method, system and equipment
CN113947086A (en) Sample data generation method, training method, corpus generation method and apparatus
Kecht et al. Event log construction from customer service conversations using natural language inference
CN109299470A (en) The abstracting method and system of trigger word in textual announcement
CN111563140B (en) Intention identification method and device
CN109062890B (en) Label switching method and device and computer readable storage medium
CN114970540A (en) Method and device for training text audit model
US20220012421A1 (en) Extracting content from as document using visual information
US11238076B2 (en) Document enrichment with conversation texts, for enhanced information retrieval
CN109508382B (en) Label labeling method and device and computer readable storage medium
CN110442858B (en) Question entity identification method and device, computer equipment and storage medium
CN110457436B (en) Information labeling method and device, computer readable storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant